Skip to content

Enhance SQS monitoring and timeout settings with documentation updates#38

Merged
gafnts merged 6 commits into
developfrom
load/agentic-extractor
Jun 7, 2026
Merged

Enhance SQS monitoring and timeout settings with documentation updates#38
gafnts merged 6 commits into
developfrom
load/agentic-extractor

Conversation

@gafnts

@gafnts gafnts commented Jun 7, 2026

Copy link
Copy Markdown
Owner

This pull request improves the reliability and accuracy of load test reporting and cleanup for the agentic flavor deployment. The main changes address issues with test harness timeouts, cleanup of incomplete documents, and the clarity of concurrency cap reporting. These updates help ensure that test results more accurately reflect the system's real behavior and reduce the risk of spurious failures due to measurement artifacts.

Load test harness and reporting improvements:

  • Increased the default timeout in the await_completion function from 600s to 900s in harness.py, ensuring the harness waits long enough for documents requiring SQS retries to reach a terminal state, thus reducing false timeouts that can fail SLO 1.
  • Enhanced the docstring for await_completion to clarify the relationship between the timeout, SQS visibility, and SLO correctness.
  • Updated the cleanup function in harness.py to skip deletion of documents still marked as timeout, preventing premature deletion that could cause phantom DLQ failures due to in-flight retries encountering missing S3 objects. Added a printout to inform the user of skipped documents.

SLO and concurrency reporting updates:

  • Improved the concurrency cap SLO logic in report.py to clarify that Lambda concurrency and throttles are the authoritative signals, while SQS in-flight is a noisy proxy now reported for reference but not used for gating. Updated SLO output to include SQS in-flight peak as a proxy value. [1] [2]

Documentation and findings:

  • Expanded the ADR post-implementation section with detailed results and findings from burst and sustained load runs, documenting SLO verdicts, root causes of failures, and rationale for recent harness and measurement changes.

@gafnts gafnts merged commit 40da8db into develop Jun 7, 2026
1 check passed
@gafnts gafnts deleted the load/agentic-extractor branch June 7, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant