Skip to content

Make extractor strategy a deploy-time choice (single_pass | agentic)#34

Merged
gafnts merged 14 commits into
developfrom
feature/agentic-deployment
Jun 7, 2026
Merged

Make extractor strategy a deploy-time choice (single_pass | agentic)#34
gafnts merged 14 commits into
developfrom
feature/agentic-deployment

Conversation

@gafnts

@gafnts gafnts commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Makes the extraction strategy a deploy-time choice via a new extractor_flavor variable (single_pass | agentic, default single_pass), so any environment can run either the single structured LLM call or the AgenticExtractor ReAct loop by flipping one variable. Single-pass stays the default everywhere; prod gains the agentic capability without reverting anything. See ADR-0016.

What changed

Handler_extractor() reads EXTRACTOR_FLAVOR and builds either SinglePassExtractor or AgenticExtractor(modality="text", max_iterations=…). Return type widens to the Extractor[NDA] protocol; both flavors' failures (including the agentic non-termination ExtractionError) flow through the existing broad exceptbatchItemFailures redrive path unchanged.

Infra — A flavor-keyed flavor_profile in the root module drives the whole parameter envelope so re-parametrization is a one-variable flip. Only two knobs actually move with the flavor:

  • max_iterations: null for single-pass (no loop, var omitted from the function env), 30 for agentic.
  • maxReceiveCount: 3 single-pass, 2 agentic — agentic failures are mostly logic, not transient, so retrying an expensive doomed run buys nothing.

Timeout, visibility timeout, modality, and concurrency cap hold across both. Per-env tfvars set staging + local to agentic, prod to single_pass. New extractor_flavor Terraform output exposes the deployed strategy.

Load harness — Reads extractor_flavor from the live stack so SLO 4 (latency) reports rather than gates for agentic (slow by design); flavor now appears in the report header and artifact.

Tests — Unit tests covering single-pass default, agentic construction, and the max_iterations env default.

Docs — ADR-0016 (new) plus ADR-0015 post-run corrections; README queue / extractor / alarm tables updated for flavor-dependent values; quality-gates CI badge + workflow rename.

Notes

The agentic staging characterization run (the deployed agency-premium measurement) is user-triggered and still pending — this PR ships the capability, not the run.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

Terraform Plan · staging

Show plan
terraform -chdir=infra plan -var-file=envs/staging.tfvars
module.publisher.data.archive_file.publisher: Reading...
module.uploader.data.archive_file.presigner: Reading...
module.uploader.data.archive_file.presigner: Read complete after 0s [id=c6003612e7d0a8992056dd501c589d957f338842]
module.publisher.data.archive_file.publisher: Read complete after 0s [id=a8a217aa670dbf877cef8e62ea09f239480a3857]
data.aws_ecr_repository.extractor: Reading...
module.extractor.aws_cloudwatch_log_group.extractor: Refreshing state... [id=/aws/lambda/agentic-kie-deploy-staging-extractor]
module.analytics.aws_glue_catalog_database.results: Refreshing state... [id=009160074575:agentic-kie-deploy_staging_analytics]
module.publisher.aws_sqs_queue.publisher_dlq: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-publisher-dlq]
module.uploader.aws_cloudwatch_log_group.api_access: Refreshing state... [id=/aws/apigateway/agentic-kie-deploy-staging-uploader]
module.table.aws_dynamodb_table.results: Refreshing state... [id=agentic-kie-deploy-staging-results]
module.uploader.aws_cloudwatch_log_group.presigner: Refreshing state... [id=/aws/lambda/agentic-kie-deploy-staging-uploader]
module.alarms.aws_sns_topic.alarms: Refreshing state... [id=arn:aws:sns:us-east-1:009160074575:agentic-kie-deploy-staging-alarms]
data.aws_secretsmanager_secret.langsmith: Reading...
module.queue.aws_sqs_queue.extraction_dlq: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-extraction-dlq]
module.uploader.data.aws_iam_policy_document.assume_role: Reading...
module.uploader.data.aws_iam_policy_document.assume_role: Read complete after 0s [id=666922913]
module.uploader.aws_apigatewayv2_api.uploader: Refreshing state... [id=szclvwcwg9]
data.aws_secretsmanager_secret.llm_provider: Reading...
data.aws_secretsmanager_secret.langsmith: Read complete after 0s [id=arn:aws:secretsmanager:us-east-1:009160074575:secret:agentic-kie-deploy/staging/langsmith-1RYUWW]
module.publisher.data.aws_iam_policy_document.assume_role: Reading...
module.publisher.data.aws_iam_policy_document.assume_role: Read complete after 0s [id=666922913]
module.publisher.aws_cloudwatch_log_group.publisher: Refreshing state... [id=/aws/lambda/agentic-kie-deploy-staging-publisher]
data.aws_secretsmanager_secret.llm_provider: Read complete after 0s [id=arn:aws:secretsmanager:us-east-1:009160074575:secret:agentic-kie-deploy/staging/llm-provider-Q0xOa6]
data.aws_caller_identity.current: Reading...
data.aws_caller_identity.current: Read complete after 0s [id=009160074575]
module.extractor.data.aws_iam_policy_document.assume_role: Reading...
module.extractor.data.aws_iam_policy_document.assume_role: Read complete after 0s [id=666922913]
module.publisher.data.aws_iam_policy_document.publisher_dlq: Reading...
module.publisher.data.aws_iam_policy_document.publisher_dlq: Read complete after 0s [id=4018107734]
module.uploader.aws_iam_role.presigner: Refreshing state... [id=agentic-kie-deploy-staging-uploader-exec]
module.queue.data.aws_iam_policy_document.extraction_dlq: Reading...
module.queue.data.aws_iam_policy_document.extraction_dlq: Read complete after 0s [id=2474462081]
module.queue.aws_sqs_queue.extraction: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-extraction]
module.publisher.aws_iam_role.publisher: Refreshing state... [id=agentic-kie-deploy-staging-publisher-exec]
module.extractor.aws_iam_role.extractor: Refreshing state... [id=agentic-kie-deploy-staging-extractor-exec]
module.publisher.aws_sqs_queue_policy.publisher_dlq: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-publisher-dlq]
module.alarms.aws_sns_topic_subscription.email[0]: Refreshing state... [id=arn:aws:sns:us-east-1:009160074575:agentic-kie-deploy-staging-alarms:ac4f0950-0494-42e2-a5cc-c0aa93337f22]
module.queue.aws_sqs_queue_policy.extraction_dlq: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-extraction-dlq]
data.aws_ecr_repository.extractor: Read complete after 0s [id=agentic-kie-deploy-staging-extractor]
module.publisher.aws_cloudwatch_metric_alarm.dlq_messages_visible: Refreshing state... [id=agentic-kie-deploy-staging-publisher-dlq-messages-visible]
module.queue.aws_cloudwatch_metric_alarm.dlq_messages_visible: Refreshing state... [id=agentic-kie-deploy-staging-extraction-dlq-messages-visible]
module.analytics.aws_s3_bucket.results_logs: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9-logs]
module.analytics.aws_s3_bucket.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.ingestion.aws_s3_bucket.ingestion_logs: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e-logs]
module.ingestion.aws_s3_bucket.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.analytics.aws_s3_bucket.athena_results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a2142-athena-results]
module.uploader.aws_apigatewayv2_stage.default: Refreshing state... [id=$default]
module.analytics.aws_s3_bucket_public_access_block.results_logs: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9-logs]
module.analytics.aws_s3_bucket_ownership_controls.results_logs: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9-logs]
module.analytics.aws_s3_bucket_server_side_encryption_configuration.results_logs: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9-logs]
module.analytics.aws_s3_bucket_lifecycle_configuration.results_logs: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9-logs]
module.analytics.data.aws_iam_policy_document.results_tls_only: Reading...
module.analytics.data.aws_iam_policy_document.results_tls_only: Read complete after 0s [id=1148671339]
module.analytics.aws_s3_bucket_logging.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_glue_catalog_table.extractions: Refreshing state... [id=009160074575:agentic-kie-deploy_staging_analytics:extractions]
module.analytics.aws_s3_bucket_ownership_controls.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_s3_bucket_public_access_block.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_s3_bucket_lifecycle_configuration.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_s3_bucket_notification.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_s3_bucket_versioning.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_s3_bucket_server_side_encryption_configuration.results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.analytics.aws_s3_bucket_policy.results_tls_only: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a21425a107d9]
module.ingestion.aws_s3_bucket_ownership_controls.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.ingestion.data.aws_iam_policy_document.ingestion_tls_only: Reading...
module.ingestion.aws_s3_bucket_lifecycle_configuration.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.ingestion.data.aws_iam_policy_document.ingestion_tls_only: Read complete after 0s [id=3357136566]
module.ingestion.aws_s3_bucket_versioning.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.ingestion.aws_s3_bucket_server_side_encryption_configuration.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.ingestion.aws_s3_bucket_notification.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.ingestion.aws_s3_bucket_public_access_block.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.publisher.data.aws_iam_policy_document.publisher: Reading...
module.publisher.data.aws_iam_policy_document.publisher: Read complete after 0s [id=170943819]
module.analytics.aws_s3_bucket_ownership_controls.athena_results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a2142-athena-results]
module.analytics.aws_s3_bucket_lifecycle_configuration.athena_results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a2142-athena-results]
module.analytics.aws_s3_bucket_server_side_encryption_configuration.athena_results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a2142-athena-results]
module.analytics.data.aws_iam_policy_document.athena_results_tls_only: Reading...
module.analytics.data.aws_iam_policy_document.athena_results_tls_only: Read complete after 0s [id=201772567]
module.analytics.aws_s3_bucket_public_access_block.athena_results: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a2142-athena-results]
module.analytics.aws_athena_workgroup.results: Refreshing state... [id=agentic-kie-deploy-staging-analytics]
module.ingestion.aws_s3_bucket_server_side_encryption_configuration.ingestion_logs: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e-logs]
module.ingestion.aws_s3_bucket_lifecycle_configuration.ingestion_logs: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e-logs]
module.ingestion.aws_s3_bucket_logging.ingestion: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.ingestion.aws_s3_bucket_ownership_controls.ingestion_logs: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e-logs]
module.ingestion.aws_s3_bucket_public_access_block.ingestion_logs: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e-logs]
module.ingestion.aws_s3_bucket_policy.ingestion_tls_only: Refreshing state... [id=agentic-kie-deploy-staging-ingestion-a57cea95f46fa68e]
module.publisher.aws_iam_role_policy.publisher: Refreshing state... [id=agentic-kie-deploy-staging-publisher-exec:publisher]
module.analytics.aws_s3_bucket_policy.athena_results_tls_only: Refreshing state... [id=agentic-kie-deploy-staging-extractions-6f57a2142-athena-results]
module.queue.aws_cloudwatch_event_rule.object_created: Refreshing state... [id=agentic-kie-deploy-staging-extraction-object-created]
module.extractor.data.aws_iam_policy_document.extractor: Reading...
module.extractor.data.aws_iam_policy_document.extractor: Read complete after 0s [id=350660783]
module.uploader.data.aws_iam_policy_document.presigner: Reading...
module.uploader.data.aws_iam_policy_document.presigner: Read complete after 0s [id=1796615201]
module.extractor.aws_iam_role_policy.extractor: Refreshing state... [id=agentic-kie-deploy-staging-extractor-exec:extractor]
module.uploader.aws_iam_role_policy.presigner: Refreshing state... [id=agentic-kie-deploy-staging-uploader-exec:presigner]
module.publisher.aws_lambda_function.publisher: Refreshing state... [id=agentic-kie-deploy-staging-publisher]
module.uploader.aws_lambda_function.presigner: Refreshing state... [id=agentic-kie-deploy-staging-uploader]
module.extractor.aws_lambda_function.extractor: Refreshing state... [id=agentic-kie-deploy-staging-extractor]
module.queue.aws_cloudwatch_event_target.extraction_queue: Refreshing state... [id=agentic-kie-deploy-staging-extraction-object-created-terraform-20260531232716413600000001]
module.queue.aws_sqs_queue_policy.extraction: Refreshing state... [id=https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-extraction]
module.publisher.aws_cloudwatch_metric_alarm.errors: Refreshing state... [id=agentic-kie-deploy-staging-publisher-errors]
module.publisher.aws_lambda_event_source_mapping.publisher: Refreshing state... [id=907f8c3c-f07c-4e39-a3b7-0cc41ca353f1]
module.publisher.aws_cloudwatch_metric_alarm.throttles: Refreshing state... [id=agentic-kie-deploy-staging-publisher-throttles]
module.extractor.aws_cloudwatch_metric_alarm.throttles: Refreshing state... [id=agentic-kie-deploy-staging-extractor-throttles]
module.extractor.aws_lambda_event_source_mapping.extraction: Refreshing state... [id=d0b595ee-0694-4ed4-8f29-fc8865eff790]
module.extractor.aws_cloudwatch_metric_alarm.errors: Refreshing state... [id=agentic-kie-deploy-staging-extractor-errors]
module.uploader.aws_lambda_permission.apigw_invoke: Refreshing state... [id=AllowAPIGatewayInvoke]
module.uploader.aws_apigatewayv2_integration.presigner: Refreshing state... [id=oh6x29e]
module.uploader.aws_cloudwatch_metric_alarm.throttles: Refreshing state... [id=agentic-kie-deploy-staging-uploader-throttles]
module.uploader.aws_cloudwatch_metric_alarm.errors: Refreshing state... [id=agentic-kie-deploy-staging-uploader-errors]
module.uploader.aws_apigatewayv2_route.uploads: Refreshing state... [id=hp13u71]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place
 <= read (data resources)

Terraform will perform the following actions:

  # module.alarms.aws_sns_topic_subscription.email[0] will be created
  + resource "aws_sns_topic_subscription" "email" {
      + arn                             = (known after apply)
      + confirmation_timeout_in_minutes = 1
      + confirmation_was_authenticated  = (known after apply)
      + endpoint                        = "gafnts@gmail.com"
      + endpoint_auto_confirms          = false
      + filter_policy_scope             = (known after apply)
      + id                              = (known after apply)
      + owner_id                        = (known after apply)
      + pending_confirmation            = (known after apply)
      + protocol                        = "email"
      + raw_message_delivery            = false
      + region                          = "us-east-1"
      + topic_arn                       = "arn:aws:sns:us-east-1:009160074575:agentic-kie-deploy-staging-alarms"
    }

  # module.extractor.aws_cloudwatch_metric_alarm.errors will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "errors" {
      ~ alarm_description                     = "Lambda invocations that ended in an unhandled exception. With maxReceiveCount=3 on the queue, a single bad document fires this up to three times before it lands in the DLQ — the alarm is the early-warning signal that the DLQ alarm is the confirmation of." -> "Lambda invocations that ended in an unhandled exception. A single bad document fires this once per delivery attempt before it lands in the DLQ (the alarm is the early-warning signal that the DLQ alarm is the confirmation of)."
        id                                    = "agentic-kie-deploy-staging-extractor-errors"
        tags                                  = {
            "Environment" = "staging"
        }
        # (23 unchanged attributes hidden)
    }

  # module.extractor.aws_lambda_function.extractor will be updated in-place
  ~ resource "aws_lambda_function" "extractor" {
        id                             = "agentic-kie-deploy-staging-extractor"
        tags                           = {
            "Environment" = "staging"
        }
        # (31 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              + "EXTRACTOR_FLAVOR"         = "agentic"
              + "EXTRACTOR_MAX_ITERATIONS" = "30"
              ~ "SQS_MAX_RECEIVE_COUNT"    = "3" -> "2"
                # (5 unchanged elements hidden)
            }
        }

        # (3 unchanged blocks hidden)
    }

  # module.queue.data.aws_iam_policy_document.extraction_queue will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_iam_policy_document" "extraction_queue" {
      + id            = (known after apply)
      + json          = (known after apply)
      + minified_json = (known after apply)

      + statement {
          + actions   = [
              + "sqs:SendMessage",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:sqs:us-east-1:009160074575:agentic-kie-deploy-staging-extraction",
            ]
          + sid       = "AllowEventBridgeSendMessage"

          + condition {
              + test     = "ArnEquals"
              + values   = [
                  + "arn:aws:events:us-east-1:009160074575:rule/agentic-kie-deploy-staging-extraction-object-created",
                ]
              + variable = "aws:SourceArn"
            }

          + principals {
              + identifiers = [
                  + "events.amazonaws.com",
                ]
              + type        = "Service"
            }
        }
      + statement {
          + actions   = [
              + "sqs:*",
            ]
          + effect    = "Deny"
          + resources = [
              + "arn:aws:sqs:us-east-1:009160074575:agentic-kie-deploy-staging-extraction",
            ]
          + sid       = "DenyInsecureTransport"

          + condition {
              + test     = "Bool"
              + values   = [
                  + "false",
                ]
              + variable = "aws:SecureTransport"
            }

          + principals {
              + identifiers = [
                  + "*",
                ]
              + type        = "*"
            }
        }
    }

  # module.queue.aws_cloudwatch_metric_alarm.dlq_messages_visible will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "dlq_messages_visible" {
      ~ alarm_description                     = "Any message in the DLQ means a document exhausted maxReceiveCount=3 retries. The DLQ alarm is the single source of truth for failed messages." -> "Any message in the DLQ means a document exhausted its maxReceiveCount retries (3 for single-pass, 2 for agentic). The DLQ alarm is the single source of truth for failed messages."
        id                                    = "agentic-kie-deploy-staging-extraction-dlq-messages-visible"
        tags                                  = {
            "Environment" = "staging"
        }
        # (23 unchanged attributes hidden)
    }

  # module.queue.aws_sqs_queue.extraction will be updated in-place
  ~ resource "aws_sqs_queue" "extraction" {
        id                                = "https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-extraction"
        name                              = "agentic-kie-deploy-staging-extraction"
      ~ redrive_policy                    = jsonencode(
          ~ {
              ~ maxReceiveCount     = 3 -> 2
                # (1 unchanged attribute hidden)
            }
        )
        tags                              = {
            "Environment" = "staging"
        }
        # (19 unchanged attributes hidden)
    }

  # module.queue.aws_sqs_queue_policy.extraction will be updated in-place
  ~ resource "aws_sqs_queue_policy" "extraction" {
        id        = "https://sqs.us-east-1.amazonaws.com/009160074575/agentic-kie-deploy-staging-extraction"
      ~ policy    = jsonencode(
            {
              - Statement = [
                  - {
                      - Action    = "sqs:SendMessage"
                      - Condition = {
                          - ArnEquals = {
                              - "aws:SourceArn" = "arn:aws:events:us-east-1:009160074575:rule/agentic-kie-deploy-staging-extraction-object-created"
                            }
                        }
                      - Effect    = "Allow"
                      - Principal = {
                          - Service = "events.amazonaws.com"
                        }
                      - Resource  = "arn:aws:sqs:us-east-1:009160074575:agentic-kie-deploy-staging-extraction"
                      - Sid       = "AllowEventBridgeSendMessage"
                    },
                  - {
                      - Action    = "sqs:*"
                      - Condition = {
                          - Bool = {
                              - "aws:SecureTransport" = "false"
                            }
                        }
                      - Effect    = "Deny"
                      - Principal = "*"
                      - Resource  = "arn:aws:sqs:us-east-1:009160074575:agentic-kie-deploy-staging-extraction"
                      - Sid       = "DenyInsecureTransport"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        # (2 unchanged attributes hidden)
    }

Plan: 1 to add, 5 to change, 0 to destroy.

Changes to Outputs:
  + extractor_flavor                = "agentic"

─────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't
guarantee to take exactly these actions if you run "terraform apply" now.

@gafnts gafnts merged commit d142b20 into develop Jun 7, 2026
9 checks passed
@gafnts gafnts deleted the feature/agentic-deployment branch June 7, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant