Skip to content

feat(aws-serverless): add aws-serverless plugin#52

Merged
krokoko merged 19 commits intoawslabs:mainfrom
gunnargrosch:feat/serverless-agents
Mar 6, 2026
Merged

feat(aws-serverless): add aws-serverless plugin#52
krokoko merged 19 commits intoawslabs:mainfrom
gunnargrosch:feat/serverless-agents

Conversation

@gunnargrosch
Copy link
Contributor

@gunnargrosch gunnargrosch commented Feb 24, 2026

Summary

Related RFC: #48

  • Add the aws-serverless plugin with three skills, MCP server configuration, SAM template validation hook, and marketplace entry
  • aws-lambda: Lambda runtime behavior, event sources, EventBridge, Step Functions, orchestration, observability, optimization, and troubleshooting
  • aws-serverless-deployment: SAM and CDK project setup, CDK constructs and patterns, CI/CD pipelines, and SAM/CDK coexistence
  • aws-lambda-durable-functions (by @bfreiberg): Getting started, checkpoint-replay model, testing guidance, advanced patterns, and error handling

Changes

Plugin infrastructure:

  • Add plugin.json and marketplace.json entry for the aws-serverless plugin
  • Add .mcp.json for awslabs.aws-serverless-mcp-server (SAM CLI tools, event source mappings, webapp deployment, metrics, schemas)
  • Add hooks/hooks.json + scripts/validate-template.sh for automatic sam validate on template edits

Skill — aws-lambda (9 reference files):

  • getting-started.md — project type decision tree, prerequisites, working with existing projects
  • event-sources.md — DynamoDB Streams, Kinesis, SQS, Kafka, S3, SNS configuration
  • event-driven-architecture.md — EventBridge bus setup, event patterns, Pipes, archive and replay
  • orchestration-and-workflows.md — orchestration approach comparison, durable functions vs Step Functions
  • step-functions.md — Standard vs Express, ASL, JSONata, SDK integrations, Distributed Map, testing
  • web-app-deployment.md — Lambda Web Adapter, API endpoints, CORS, authentication, custom domains
  • observability.md — structured logging, tracing, metrics, alarms, dashboards
  • optimization.md — cold starts, memory tuning, cost, streaming, Powertools
  • troubleshooting.md — common errors, debugging, deployment failures

Skill — aws-serverless-deployment (5 reference files):

  • sam-project-setup.md — SAM templates, deployment workflow, local testing, container images
  • cdk-project-setup.md — CDK setup, construct levels, IAM grants, stack separation, testing, pipelines
  • cdk-lambda-constructs.md — NodejsFunction, PythonFunction, base Function construct examples
  • cdk-serverless-patterns.md — API Gateway, Function URL, EventBridge, DynamoDB, SQS CDK patterns
  • sam-cdk-coexistence.md — incremental migration, using sam build with CDK templates

Skill — aws-lambda-durable-functions (9 reference files, by @bfreiberg):

  • getting-started.md — SDK installation, basic handler pattern, ESLint/Jest setup
  • replay-model-rules.md — determinism rules, non-deterministic code handling
  • step-operations.md — atomic operations, retry logic, step semantics
  • wait-operations.md — delays, callbacks, external system integration, polling
  • concurrent-operations.md — parallel execution, map operations, batch processing
  • error-handling.md — retry strategies, saga pattern, compensating transactions
  • testing-patterns.md — LocalDurableTestRunner, cloud testing, flaky test prevention
  • deployment-iac.md — CloudFormation, CDK, SAM deployment patterns
  • advanced-patterns.md — GenAI agents, completion policies, custom serialization

Origin

The aws-lambda and aws-serverless-deployment skills are based on https://github.com/gunnargrosch/aws-serverless-plugin

Test plan

  • Verify mise run lint:manifests passes for plugin.json, marketplace.json, and .mcp.json
  • Verify mise run lint:md passes for all SKILL.md and reference files
  • Verify mise run lint:cross-refs passes
  • Test plugin locally with claude --plugin-dir ./plugins/aws-serverless

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

gunnargrosch and others added 2 commits February 24, 2026 16:19
aws-serverless-deployment skills

Add the aws-serverless plugin with two new skills, MCP server
configuration, SAM template validation hook, and marketplace entry.

- aws-lambda: Lambda runtime behavior, event sources, EventBridge,
  Step Functions, orchestration, observability, optimization, and
  troubleshooting
- aws-serverless-deployment: SAM and CDK project setup, CDK constructs
  and patterns, CI/CD pipelines, and SAM/CDK coexistence

Also adds the aws-serverless MCP server (.mcp.json), a SAM template
validation hook, and updates the marketplace registry.

Based on https://github.com/gunnargrosch/aws-serverless-plugin
Add the aws-lambda-durable-functions skill to the aws-serverless
plugin with getting started guide, checkpoint-replay model, testing
guidance, advanced patterns, and error handling references.
@gunnargrosch gunnargrosch requested a review from a team February 24, 2026 15:23
krokoko and others added 5 commits February 25, 2026 16:27
- Remove sensitive data access flags from MCP server configuration
- Add --lint flag to SAM template validation for stricter checks
- Update validation messages to reflect linting improvements
- Reorganize SKILL.md with clearer onboarding steps and prerequisites
- Refine skill description to focus on core capabilities
- Add advanced-error-handling.md reference guide for timeout and circuit breaker patterns
- Update reference file routing to include advanced error handling scenarios
- Consolidate guidelines into onboarding section for better user flow
- Improve documentation structure for better discoverability
… requirements

- Split Powertools documentation into dedicated reference file (powertools.md)
- Update SKILL.md routing to direct Powertools queries to new dedicated reference
- Update observability.md link to point to powertools.md instead of optimization.md
- Clarify Python runtime requirements for durable functions (3.11+ minimum, 3.13+ for Lambda pre-installed SDK)
@krokoko
Copy link
Contributor

krokoko commented Feb 27, 2026

My feedback (in addition to previous comments from other reviewers which need to be resolved):

  • Issue: version mismatch. marketplace.json sets version: "1.0.0" but plugin.json sets version:
    "1.1.0". These should match. For a new plugin, 1.0.0 in both is appropriate.
  • The aws-lambda and aws-serverless-deployment skills include "Guidelines" sections that
    say "Ask which IaC framework" and "Ask which programming language" — this is good for user
    interaction but there are no explicit defaults specified (e.g., "Default: SAM for new
    projects"). The design guidelines state: "Specify defaults clearly, then provide override
    syntax" and "Agents need explicit guidance on what to do when users don't specify preferences."
  • Neither aws-lambda/SKILL.md nor aws-serverless-deployment/SKILL.md document what to do if
    the aws-serverless-mcp MCP server is unavailable. The design guidelines require explicit error
    scenarios: "Inform user: 'Cost estimation unavailable (awspricing MCP not responding)' ... DO
    NOT continue without user confirmation."
  • The durable-functions/advanced-patterns.md contains a full section on "Advanced Error Handling"
    (timeout handling with waitForCallback, local timeout with Promise.race) that overlaps
    significantly with durable-functions/advanced-error-handling.md. Both files contain nearly
    identical code examples for callback timeout handling and Promise.race patterns. This wastes
    context tokens when both files are loaded.
  • Typos: In aws-serverless-deployment/SKILL.md:
    • "prevents access to sensitve data" → "sensitive"
    • "To grant accees" → "access"

- Move troubleshooting production executions content to dedicated troubleshooting-executions.md reference file
- Set language and IaC framework defaults including override syntax
- Add error scenario handling for unsupported languages and frameworks
- Remove inline troubleshooting agent instructions from main SKILL.md for better modularity
- Consolidate advanced error handling references to separate advanced-error-handling.md file
- Update plugin version from 1.1.0 to 1.0.0
@bfreiberg
Copy link
Contributor

Thanks for the feedback @krokoko, I've pushed adjustments for all of it

@krokoko
Copy link
Contributor

krokoko commented Mar 2, 2026

Thank you ! Left

  1. Add CODEOWNERS entry for plugins/aws-serverless/ . Create a team and request @awslabs/agent-plugins-writers as the parent following https://github.com/awslabs/agent-plugins/blob/main/docs/MAINTAINERS_GUIDE.md#plugin-teams
  2. Add README.md table entry for the new plugin
  3. Clarify .mcp.json default flags vs. what the SKILL.md describes — aws-lambda/SKILL.md mentions --allow-write and --allow-sensitive-data-access as configured, but
    they're not in the actual .mcp.json
  4. Address remaining reviewer comments from @reedham-aws and @bfreiberg (runtime version lists, SAM/SAM CLI terminology, Docker vendor specificity)

bfreiberg and others added 4 commits March 2, 2026 20:36
- Add aws-serverless plugin to CODEOWNERS with appropriate team assignments
- Add aws-serverless plugin to main README.md plugin table with feature description
- Simplify AWS CLI setup instructions and remove redundant steps
- Separate SAM CLI and container runtime setup into distinct sections
- Update container runtime documentation to mention alternatives
- Clarify MCP server default security posture and flag requirements in Lambda skill
Signed-off-by: Alain Krok <alkrok@amazon.com>
krokoko
krokoko previously approved these changes Mar 4, 2026
Copy link
Contributor

@krokoko krokoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor edits can be added but no blocker for a first release

@krokoko krokoko added the do-not-merge Do not merge the pull request label Mar 4, 2026
@scottschreckengaust
Copy link
Member

Code Review -- PR #52 feat(aws-serverless): add aws-serverless plugin

Reviewed at commit 6446d439198a29efe5f9ea8140e82d2253e77513. Ran code-review (5 parallel agents) and pr-review-toolkit (code-reviewer, comment-analyzer, silent-failure-hunter).


Critical Issues (4)

1. .mcp.json argument --allow write is the wrong format -- MCP server will reject it or start read-only

The upstream awslabs.aws-serverless-mcp-server uses argparse with --allow-write (hyphenated boolean flag, action='store_true'). The .mcp.json passes "--allow write" as a single string element in the args array, which is delivered to the process as one argv token containing a space. argparse will not recognize it and will either error or silently start without write permissions, disabling deployment capabilities (sam_deploy, deploy_webapp, etc.).

"args": [
"awslabs.aws-serverless-mcp-server@latest",
"--allow write"
],
"env": {

All three SKILL.md files also document this incorrectly as --allow write:

  • **Write access is enabled by default.** The plugin ships with `--allow write` in `.mcp.json`, so the MCP server can create projects, generate IaC, and deploy on behalf of the user.
    Access to sensitive data (like Lambda and API Gateway logs) is **not** enabled by default. To grant it, add `--allow-sensitive-data-access` to `.mcp.json`.
  • **Write access is enabled by default.** The plugin ships with `--allow write` in `.mcp.json`, so the MCP server can create projects, generate IaC, and deploy on behalf of the user.
    Access to sensitive data (like Lambda and API Gateway logs) is **not** enabled by default. To grant it, add `--allow-sensitive-data-access` to `.mcp.json`.
  • **Write access is enabled by default.** The plugin ships with `--allow write` in `.mcp.json`, so the MCP server can create projects, generate IaC, and deploy on behalf of the user.
    Access to sensitive data (like Lambda and API Gateway logs) is **not** enabled by default. To grant it, add `--allow-sensitive-data-access` to `.mcp.json`.

Fix: change "--allow write" to "--allow-write" in .mcp.json and update documentation to match.


2. validate-template.sh crashes when jq is missing -- breaks every file edit, not just SAM templates

The script uses jq on line 9 to extract the file path, but there is no command -v jq guard. With set -euo pipefail on line 5, a missing jq causes a non-zero exit on every Edit/Write operation -- including non-template files -- because the crash occurs before the filename filter on lines 12-15. The SKILL.md documentation claims the hook "silently skips if [jq] is not available." This is false.

set -euo pipefail
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
# Only validate SAM template files
case "$FILE_PATH" in
*template.yaml|*template.yml) ;;
*) exit 0 ;;
esac

Fix: add command -v jq check before line 9, matching the existing sam check on lines 18-20:

if ! command -v jq &> /dev/null; then
  exit 0
fi

3. Python version for durable functions is inconsistent: 3.14+ vs 3.13+ vs 3.11+

Three different minimum Python versions appear across files for the same feature:

File Stated Version
aws-lambda/references/getting-started.md Python 3.14+
aws-lambda/references/orchestration-and-workflows.md Python 3.13+
aws-lambda-durable-functions/SKILL.md Python 3.11+ (SDK min), 3.13+ (pre-installed)

Python 3.14 is not yet GA. A user reading getting-started.md is told they need a runtime that does not exist.

# Getting Started with AWS Serverless Development

Fix: standardize on "Python 3.13+" (pre-installed in Lambda runtime) across all files.


4. SAM template output references non-existent logical resource DurableFunctionAlias

The SAM template example uses AutoPublishAlias: prod, which causes SAM to generate an alias with the logical ID DurableFunctionAliasProd (convention: <FunctionLogicalId>Alias<AliasName>). The Outputs section references !Ref DurableFunctionAlias, which does not exist. Users deploying this template get a CloudFormation failure.

Fix: change the SAM Outputs to !Ref DurableFunctionAliasProd or remove the Outputs section from the SAM example.


Important Issues (7)

5. IAM action name inconsistency: singular vs plural CheckpointDurableExecution(s)

aws-lambda/SKILL.md uses the singular form lambda:CheckpointDurableExecution. aws-lambda-durable-functions/SKILL.md and deployment-iac.md use the plural lambda:CheckpointDurableExecutions. Only one is the real IAM action -- the wrong one silently fails in IAM policies.

6. nodejs24.x / python3.14 runtimes in deployment-iac.md may not exist yet

All CloudFormation, SAM, and CDK examples in deployment-iac.md use nodejs24.x, python3.14, NODEJS_24_X, and PYTHON_3_14. These are forward-looking runtimes/CDK enums that may not be available in Lambda or aws-cdk-lib. Users deploying as-is will get InvalidParameterValueException.

7. Lambda burst scaling rate conflict: 1,000 vs 500 per 10 seconds

aws-lambda/SKILL.md (Lambda Limits table) says 1,000 new executions per 10s. aws-lambda/references/troubleshooting.md says 500 per 10s. These are in the same skill.

8. step-functions-testing.md not reachable from SKILL.md

The file exists but is not listed in the "When to Load Reference Files" router in aws-lambda/SKILL.md. An agent will only find it by first loading step-functions.md and following an internal link.

9. No timeout in .mcp.json

The existing deploy-on-aws plugin sets "timeout": 120000. Without a timeout, uvx package download or MCP tool calls can hang indefinitely, blocking the user's session.

10. Hook timeout of 30s may be too short

sam validate --lint on first invocation downloads the cfn-lint schema cache, which can exceed 30s. A timeout results in no validation output (neither success nor failure).

11. jq failure on the error-reporting path (line 32) loses validation output

If jq fails when formatting the sam validate error output, set -euo pipefail aborts the script and the actual validation error message is lost. A fallback should be added.


Suggestions (6)

12. 23 of 27 reference files exceed the 100-line guideline in DESIGN_GUIDELINES.md (up to 559 lines). Consider splitting for context window efficiency.

13. The hooks field in aws-lambda-durable-functions/SKILL.md frontmatter is not in skill-frontmatter.schema.json. Verify the skill runtime processes it; if not, the replay model violation reminders silently never fire.

14. The circuit breaker example in advanced-error-handling.md uses closure mutations (failureCount, lastFailureTime) that violate the replay model rules documented in the same PR. Add a caveat or refactor.

15. MCP configuration sections are duplicated across all 3 SKILL.md files. Consider a shared reference file to prevent drift.

16. Verify SDK package names (@aws/durable-execution-sdk-js, aws-durable-execution-sdk-python, etc.) exist on npm/PyPI before merge.

17. Silent exit 0 when SAM CLI is missing gives no user feedback. Consider returning a system message: "SAM CLI not found -- template validation skipped.".


Strengths

  • Well-separated skills with clear cross-referencing between them
  • Replay model rules documentation is outstanding (paired WRONG/CORRECT examples in TypeScript and Python)
  • Production-ready event sources guide with real-world warnings (recursive S3 triggers, URL-decoding)
  • Progressive disclosure via "When to Load Reference Files" routing tables
  • Honest orchestration comparison (durable functions vs Step Functions)
  • All three SKILL.md files within the 300-line limit
  • Proper defensive shell scripting (set -euo pipefail, exit code capture, file existence check)
  • Complete MCP unavailability error handling in all skills

Generated with Claude Code

If this review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Member

@scottschreckengaust scottschreckengaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review critical issues

- Fix IAM permission name from `CheckpointDurableExecutions` to `CheckpointDurableExecution`
- Update MCP server configuration flag from `--allow write` to `--allow-write` for consistency
- Increase validation script timeout from 30s to 120s and MCP server timeout to 120000ms
- Add jq dependency check to validate-template.sh with graceful fallback message
- Improve SAM CLI validation error handling with fallback JSON formatting
- Add circuit breaker replay model caveat to advanced error handling documentation
- Fix CloudFormation reference from `DurableFunctionAlias` to `DurableFunction.Alias`
- Add Step Functions testing reference to Lambda skill navigation guide
- Enhance error messages for missing dependencies and validation failures
…fication

- Remove PostToolUse hooks from aws-lambda-durable-functions SKILL.md that provided replay model reminders
- Update getting-started.md to remove hardcoded docker --version check
- Replace Docker requirement with flexible container runtime verification (Docker, Finch, Podman, etc.)
- Simplify credential setup instructions to be more concise
@bfreiberg
Copy link
Contributor

Thanks @scottschreckengaust for your thorough review. I've pushed fixes for all findings except

  1. the Python runtime versions as that varies depending if you are building a Lambda durable function or "regular" function.
  2. The AutoPublishAlias configuration, as the documentation states differently (https://docs.aws.amazon.com/lambda/latest/dg/durable-getting-started-iac.html) and I've tested that the current version works as intended.

@krokoko krokoko self-requested a review March 4, 2026 22:58
krokoko
krokoko previously approved these changes Mar 5, 2026
- Condense advanced error handling patterns with concise implementation approaches
- Streamline advanced patterns documentation with focused guidance
- Simplify error handling reference with key considerations
- Update step operations documentation for clarity
- Refactor testing patterns reference for better readability
- Consolidate troubleshooting executions guide
- Simplify wait operations documentation
- Update orchestration and workflows reference in main Lambda skill
- Remove verbose code examples in favor of pattern descriptions and implementation approaches
- Normalize marketplace tags from hyphenated to space-separated format
- Fix heading hierarchy in durable functions SKILL.md
- Reorder CODEOWNERS entries alphabetically
- Update Python code examples with proper imports and corrected API usage patterns
- Improve code consistency and readability across durable functions documentation
…erns

- Fix error property reference in testing-patterns.md from `getError()?.message` to `getError()?.errorMessage`
- Update aws-serverless-deployment description to include "use SAM" trigger phrase
- Clarify Python SDK differences from TypeScript for durable functions implementation
@krokoko krokoko self-requested a review March 6, 2026 14:51
@krokoko krokoko requested a review from XinyuQu March 6, 2026 16:02
@krokoko krokoko enabled auto-merge March 6, 2026 19:40
@krokoko krokoko dismissed scottschreckengaust’s stale review March 6, 2026 19:40

OOTO and changes were implemented

@krokoko krokoko removed the do-not-merge Do not merge the pull request label Mar 6, 2026
@krokoko krokoko added this pull request to the merge queue Mar 6, 2026
Merged via the queue into awslabs:main with commit 6a5c696 Mar 6, 2026
24 of 25 checks passed
@bfreiberg bfreiberg deleted the feat/serverless-agents branch March 6, 2026 19:45
icarthick added a commit to icarthick/agent-plugins that referenced this pull request Mar 7, 2026
* main:
  fix(lint): pretty up JSON (awslabs#62)
  chore(deps): update github-actions: Bump actions/upload-artifact (awslabs#80)
  chore(deps): update github-actions: Bump actions/download-artifact (awslabs#81)
  chore(deps): update github-actions: Bump actions/dependency-review-action (awslabs#83)
  feat(aws-serverless): add aws-serverless plugin (awslabs#52)
  docs: remove dead references to .claude/docs/ files that were never committed (awslabs#78)

# Conflicts:
#	.claude-plugin/marketplace.json
#	README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants