Skip to content

Discover BCM Data Exports CUR 2.0 buckets for IAM policy#110

Open
silvexis wants to merge 6 commits into
developfrom
feature/discovery-cur2-data-exports
Open

Discover BCM Data Exports CUR 2.0 buckets for IAM policy#110
silvexis wants to merge 6 commits into
developfrom
feature/discovery-cur2-data-exports

Conversation

@silvexis

Copy link
Copy Markdown
Member

Summary

Extends the discovery Lambda's CUR bucket discovery to also enumerate BCM Data Exports (CUR 2.0) S3 destination buckets, so the master-payer IAM policy grants s3:Get*/s3:List* on them alongside classic CUR buckets.

  • coeffects_bcm_data_exports calls bcm-data-exports:ListExports (paginated) then GetExport per export to read DestinationConfigurations.S3Destination.S3Bucket.
  • get_all_local_cur_bucket_names unions classic CUR report buckets with the data-export buckets, keeping only locally-owned ones, and feeds them into MasterPayerBillingBucketArns.
  • The DiscoveryFunction execution role gains bcm-data-exports:ListExports and bcm-data-exports:GetExport. (The cross-account role policies already grant bcm-data-exports:Get*/List*.)
  • Graceful degradation: an AccessDeniedException from BCM Data Exports logs a warning and falls back to classic CUR buckets.

Behavior notes

  • IAM bucket access is type-agnostic — access is granted to the destination bucket of every export, regardless of type.
  • Ingest remains COST_AND_USAGE_REPORT-only — CUR 2.0 is not wired into primary-bucket / BillingReportFormat selection; the classic CUR still drives what CloudZero ingests.

Testing

pytest — 18 passed, 89% coverage; flake8 clean. New tests cover: local CUR 2.0 bucket included, all export buckets covered regardless of type, remote bucket excluded, and graceful degradation on access-denied.

Discovery now calls bcm-data-exports ListExports/GetExport to find the
S3 destination buckets of CUR 2.0 exports and folds every export's
bucket into MasterPayerBillingBucketArns. Bucket access is granted
type-agnostically (only COST_AND_USAGE_REPORT is ingested), and the
discovery Lambda role gains the two bcm-data-exports actions.

Author: Erik Peterson <erik@cloudzero.com>
@silvexis silvexis requested a review from a team as a code owner June 15, 2026 19:06
@silvexis silvexis requested a review from khill2018 June 15, 2026 19:06
Comment thread services/discovery/src/app.py Outdated
Comment thread services/discovery/tests/unit/test_app.py Outdated

@khill2018 khill2018 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, but I agree with greptile about the test additions. Let's add those in (happy to do it myself) and then merge! Edit to add I see Erik is on a plane going to make those changes myself and have somebody additionally review the PR

Address review feedback on BCM Data Exports discovery:
- Wrap each get_export call in its own try/except so a transient failure
  on a single export no longer drops buckets already resolved from other
  exports; the failing export is logged and skipped.
- Keep list_exports failure as graceful fallback to empty exports.
- Add tests for the multi-page list_exports NextToken pagination path and
  for per-export get_export error isolation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Cloudzero Cloudzero deleted a comment from greptile-apps Bot Jun 16, 2026
@khill2018

Copy link
Copy Markdown
Contributor

@greptile

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown

Greptile Summary

This PR extends the discovery Lambda to enumerate BCM Data Exports (CUR 2.0) S3 destination buckets and fold them into MasterPayerBillingBucketArns, so the master-payer IAM policy grants s3:Get*/List* on both classic CUR and CUR 2.0 buckets. It also removes the toolz pipeline style in favour of straight procedural code, removes the deprecated audit/cloudtrail-owner account types, replaces the inline Custom::CostAndUsageReport Lambda with the native AWS::CUR::ReportDefinition resource, and flattens the connected-account stack from six nested stacks to one.

  • list_data_export_bucket_names() paginates bcm:ListExports and resolves each export ARN with bcm:GetExport; per-export errors are isolated so a single GetExport failure doesn't drop the rest.
  • all_local_billing_bucket_names() unions classic CUR report buckets with BCM Data Export buckets, keeping only locally-owned ones; graceful AccessDeniedException degradation falls back to classic CUR buckets.
  • DiscoveryFunction execution role gains bcm-data-exports:ListExports + bcm-data-exports:GetExport; the new account_resources.yaml nested template replaces the old per-account-type templates.

Confidence Score: 4/5

Safe to merge with one fix: a mid-pagination ListExports failure discards all ARNs from earlier pages instead of using them.

The list_data_export_bucket_names function wraps the entire list_exports pagination loop in a single try/except. If the first page succeeds but any subsequent page raises, the except block executes return [], silently discarding all ARNs collected from previous pages. For accounts with enough exports to span multiple pages, this means their export buckets are missing from the IAM policy until the next stack update. Everything else looks correct.

services/discovery/src/app.py — the list_data_export_bucket_names pagination error handling.

Important Files Changed

Filename Overview
services/discovery/src/app.py Major refactor from toolz-pipeline style to procedural; adds list_data_export_bucket_names() for BCM Data Exports. One issue: mid-pagination list_exports failure discards all ARNs collected from earlier pages.
services/discovery/tests/unit/test_app.py Comprehensive test coverage for BCM Data Exports paths: local bucket included, all-export-types covered, remote bucket excluded, access-denied degradation, pagination across two pages, per-export isolation, and missing-destination guard.
services/discovery/template.yaml Adds bcm-data-exports:ListExports and bcm-data-exports:GetExport to the DiscoveryFunction execution role; removes deprecated cloudtrail:DescribeTrails.
services/account_type/account_resources.yaml New template merging former resource_owner + master_payer templates; replaces Custom::CostAndUsageReport Lambda with native AWS::CUR::ReportDefinition (us-east-1 gated). IAM policies and Conditions look correct.
services/notification/src/app.py Removed toolz pipeline and CFN stack-scraping; notification now builds payload directly from event properties. Fixed reactor payload contract is preserved including deprecated cloudtrail/audit fields.
services/connected_account.yaml Flattened to a single nested AccountResources stack; inlines Discovery and Notification Lambdas; removes deprecated audit/cloudtrail-owner nested stacks. Adds Stage parameter for prod/dev callback URL selection.

Comments Outside Diff (1)

  1. services/discovery/src/app.py, line 2398-2409 (link)

    P1 Mid-pagination ListExports failure silently discards already-collected ARNs

    If list_exports succeeds on page 1 but raises on page 2, the export_arns list that was populated from page 1 is owned by the try block. When the except fires, return [] discards those page-1 ARNs entirely — so no export buckets at all end up in the IAM policy, even though half the exports were successfully enumerated.

    The fix is to move the exception handling inside the loop so a page failure breaks out with partial results rather than discarding everything. The per-export get_export isolation pattern already in the code below is the right model to follow here.

Reviews (7): Last reviewed commit: "Drop CUR-format log line that trips Code..." | Re-trigger Greptile

@khill2018 khill2018 requested a review from qiuz-cz June 16, 2026 14:43
Comment thread services/discovery/src/app.py Outdated
next_token = response.get('NextToken')
if not next_token:
break
except ClientError:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except ClientError may be too narrow to deliver the intended graceful degradation. Two non-ClientError failure modes can bypass it:

  1. Import-time client construction (line 19): boto3.client('bcm-data-exports', ...) runs at module scope. If the Lambda runtime's bundled botocore predates this relatively new service, it raises UnknownServiceError (a botocore.exceptions error, not ClientError) on cold start — before any try/except runs — crashing the whole custom resource.
  2. Call-time connectivity — e.g. EndpointConnectionError is a BotoCoreError, not a ClientError.

Suggestion: confirm the runtime's botocore version actually supports bcm-data-exports, and consider broadening the catch here to also include BotoCoreError so unexpected client-side failures still fall back to classic CUR rather than failing the stack.

Comment thread services/discovery/src/app.py Outdated
try:
export = bcm.get_export(ExportArn=export_arn).get('Export', {})
except ClientError:
logger.warning(f'Failed to access BCM Data Exports GetExport for {export_arn}', exc_info=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer lazy %s logging over an f-string for the message arg — it defers interpolation and matches the other logger.warning(...) calls in this module:

logger.warning('Failed to access BCM Data Exports GetExport for %s', export_arn, exc_info=True)

return {'Exports': [{'ExportArn': arn, 'ExportName': arn.split('/')[-1]} for arn in export_arns]}


def _get_export_response(bucket_name):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small coverage gap: there's no test for the if bucket: guard in coeffects_bcm_data_exports (app.py:171) — i.e. an export whose DestinationConfigurations.S3Destination.S3Bucket is missing/None. A quick test asserting such an export is silently skipped (and doesn't add an empty/None bucket) would lock in that behavior.

Comment thread docs/releases/1.0.101.md
@@ -0,0 +1,24 @@
# [1.0.101](https://github.com/Cloudzero/provision-account/compare/1.0.100...1.0.101) (2026-06-15)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version (1.0.101) and date (2026-06-15) are hardcoded here. The compare-URL format suggests releases may be auto-generated (semantic-release style) — if so, a hand-written release file could conflict with or duplicate the generated one. Worth confirming this is the intended workflow for the repo.

Collapse the 6 nested stacks into one: connected_account.yaml inlines the
Discovery and Notification Lambdas and provisions account resources through a
single nested AccountResources stack (merged resource_owner + master_payer).

Replace the Custom::CostAndUsageReport Lambda with native AWS::S3::Bucket,
AWS::S3::BucketPolicy and AWS::CUR::ReportDefinition, gated on us-east-1 with
DeletionPolicy: Retain. Remove the deprecated audit and cloudtrail-owner account
types (templates, discovery detection, cloudtrail:DescribeTrails); the reactor
payload contract is unchanged, emitting null/false for the deprecated fields.

Rewrite the discovery and notification Lambdas into clear procedural flows;
notification now reads direct resource properties instead of scraping stack
outputs. Add a Stage parameter so prod/dev parents differ by one line.

Author: Erik Peterson <erik@cloudzero.com>
Comment thread services/discovery/src/app.py Fixed
Comment thread services/notification/src/app.py Fixed
- CodeQL (py/clear-text-logging-sensitive-data): stop logging the full event /
  payload / output in the discovery and notification handlers; they carry ExternalId
  and account-identifying fields. Log only non-sensitive summary fields.
- Broaden the BCM Data Exports list/get catch to BotoCoreError so connectivity
  errors degrade gracefully (per review feedback).
- Use lazy %s logging for the per-export GetExport warning.
- Add tests for an export with no destination bucket (the if-bucket guard).
- Drop now-unused toolz from discovery/notification requirements.

Author: Erik Peterson <erik@cloudzero.com>
@silvexis

Copy link
Copy Markdown
Member Author

Thanks for the reviews. Addressed the feedback in 0f5f29c (on top of the stack-flattening work):

CodeQL — clear-text logging of sensitive data (py/clear-text-logging-sensitive-data)
The discovery and notification handlers no longer log the full event / payload / output — those carry ExternalId and account-identifying fields. They now log only non-sensitive summary fields (RequestType, message_type, response status, the boolean classification flags).

@qiuz-czexcept ClientError too narrow (app.py:122)
Broadened the BCM Data Exports list/get catches to (ClientError, BotoCoreError) so connectivity errors (e.g. EndpointConnectionError) degrade gracefully. On the import-time UnknownServiceError concern: requirements.txt pins boto3>=1.34.91, and bcm-data-exports has shipped in botocore since 1.34.0, so the client constructs fine on the python3.12 runtime.

@qiuz-cz — lazy %s logging (app.py:134) ✅ Switched the per-export GetExport warning to logger.warning('... %s', export_arn, ...).

@qiuz-cz — coverage gap for the if bucket: guard ✅ Added test_handler_skips_export_with_no_destination_bucket.

@qiuz-cz — hardcoded version/date in docs/releases/1.0.101.md — confirmed: release notes here are hand-written and committed (see 1.0.95.md, 1.0.100.md); the ${...} compare-URL is just the file's own convention, not semantic-release generation. Same pattern followed for the new 1.0.102.md.

Greptile P2s (per-export error isolation + pagination test) — these were resolved by @khill2018's 50c9b31 and are preserved through the refactor (the discovery handler was rewritten; the isolation + pagination behavior and tests carried over).

Also dropped the now-unused toolz dependency from both Lambda requirements.

Heads-up @khill2018: this PR now also contains the connected-account stack flattening (6 nested stacks → 1), native AWS::CUR::ReportDefinition, and removal of the deprecated audit / cloudtrail-owner account types — see docs/releases/1.0.102.md. Your 50c9b31 discovery change was preserved through the rewrite.

Comment thread services/notification/src/app.py Fixed
The redacted logs still read tainted values: the notification payload dict
carries ExternalId (so any field read is flagged) and the discovery log
interpolated the CUR bucket name. Log a static message when posting to the
reactor, and log only the (literal) billing report format when selecting a CUR.

Author: Erik Peterson <erik@cloudzero.com>
Comment thread services/discovery/src/app.py Fixed
CodeQL's clear-text-logging 'private data' heuristic flags the variable name
billing_report_format (contains 'billing'), not its value (a literal like
'aws'). The log carried no real value beyond the format already present in the
discovery output, so remove it rather than fight the scanner.

Author: Erik Peterson <erik@cloudzero.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants