Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@
Thumbs.db
Desktop.ini
samples/deployment-compliance/skills/.DS_Store

# Lab launcher state
labs/**/.deployed/
labs/**/*.legacy

127 changes: 127 additions & 0 deletions labs/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# AGENTS.md — Authoring a Zava Unlimited lab

> Audience: AI assistants (Copilot CLI, Claude Code, Cursor, VS Code Copilot,
> GitHub Copilot Workspace) helping a human contributor add a new lab to the
> Zava Unlimited SRE Agent demo platform.

This file is the universal contract. Read it end-to-end before generating any
files. The human contributor will paste a prompt like:

> "Help me add a new lab for Azure SQL connection-pool exhaustion."

Your job: interview them, then scaffold a working lab they can `azd up`.

## Platform shape

Every lab lives in `labs/<lab-name>/` and provides:

| File | Required | Purpose |
|---|---|---|
| `lab.yaml` | ✅ | Manifest — see `_platform/schema/lab.schema.json` |
| `azure.yaml` | ✅ | azd entrypoint with pre/postprovision hooks |
| `infra/main.bicep` | ✅ | Subscription-scoped bicep that creates RG + resources |
| `scripts/check-environment.ps1` | ✅ | preprovision: prereq + prompt collection (reads `lab.yaml`) |
| `scripts/post-provision.ps1` | ✅ | postprovision: image build, srectl apply, write `.deployed/<name>.json`, optionally launch sim |
| `scripts/scenarios/*.ps1` (or `.py`/`.sh`) | ⚪ | One file per break/fix scenario declared in `lab.yaml` |
| `simulator/` | ⚪ | Lab's own rich sim UI (the meta-sim's `sim.command` points here) |
| `README.md` | ✅ | First non-blank, non-heading line is the launcher's description |

## Contract: lab.yaml

Read the full schema at `_platform/schema/lab.schema.json`. Annotated example
at `_platform/schema/lab.example.yaml`. Key points:

- `name` must equal the directory name (kebab-case)
- `prereqs` lists CLI tools that must be on PATH (e.g. `az`, `azd`, `srectl`)
- `prompts` declares values the launcher collects interactively and stashes in
azd env (use SCREAMING_SNAKE for `name`)
- `scenarios[].runner` is a path relative to the lab root; the meta-sim shells
out to it. `.ps1` runs in pwsh, `.py` in python, `.sh` in bash.
- `sim.command` + `args` is how the meta-sim launches the lab's own rich UI

## Contract: post-provision.ps1

The one **mandatory** thing this script must do at the end of a successful
deploy is write `.deployed/<lab-name>.json` with at minimum:

```json
{
"name": "<lab-name>",
"deployedAt": "<ISO timestamp>",
"subscriptionId": "<sub>",
"resourceGroup": "<rg>",
"region": "<location>"
}
```

Add any extra fields the lab's sim or scenarios need (e.g. `sreAgentName`,
`portalUrl`, `containerRegistryName`). The meta-sim and scenario runners read
this file to know what's deployed and where.

If `$env:LAB_NO_AUTOLAUNCH` is set, do NOT launch the sim at the end (the
multi-lab launcher sets this to avoid blocking).

## Authoring flow — what to ask the contributor

Don't ask everything at once. Ask in this order:

1. **What does this lab demonstrate?** (one sentence)
2. **Lab name** (kebab-case, e.g. `zava-fintech`) and **subsidiary** (e.g.
`Zava Fintech` for the displayName/branding)
3. **What Azure compute?** (AKS, ACA, VM, App Service, Functions, …) — this
shapes `infra/main.bicep`
4. **What sample app?** (existing image? new code in `src/`? none / infra-only?)
5. **What integrations?** (ServiceNow, GitHub, Datadog, …) — each adds a
prompt and likely a connector + secret
6. **What scenarios?** Get 3-8 break/fix scenarios. For each:
- id (kebab-case)
- what breaks
- what the agent should do
- approximate runtime
7. **Any non-Azure prereqs?** (`docker`, `srectl`, `kubectl`, `helm`, …)

Then:

1. Run `pwsh ./labs/lab.ps1 -New <lab-name>` — this drops the skeleton
2. Edit `lab.yaml` to fill in prereqs, prompts, scenarios collected above
3. Edit `infra/main.bicep` for the resources implied by step 3-5
4. For each scenario, create `scripts/scenarios/<id>.ps1` from the example
template (it shows the polling-for-thread-URL pattern)
5. Validate: `python _platform/helpers/manifest.py validate <lab>/lab.yaml`
6. Test discovery: `pwsh ./labs/lab.ps1 -List` should show the new lab
7. (Optional) Deploy: `./lab.sh -Labs <lab-name>`

## Reference labs to mimic

- `zava-power/` — full ACA + ServiceNow + 8 scenarios. Best reference for
complex labs with rich integrations.
- `zava-athletic/` — simpler AKS+Postgres lab with 3 scenarios. Best
reference for single-domain labs.

## Hard rules — do not violate

- **Never modify `_platform/`** without an explicit ask from the human. That's
the platform itself; lab-author flow only adds new lab dirs.
- **Never overwrite an existing lab dir.** If `<lab-name>` exists, ask the
human to pick a different name or explicitly confirm they want to delete.
- **Never commit secrets.** Prompts with `secret: true` go to azd env at
deploy time; they must NOT be hardcoded into bicep, scripts, or yaml.
- **Bicep must be subscription-scoped** (`targetScope = 'subscription'`) and
create its own RG. azd assumes this.
- **`.deployed/` is gitignored.** Don't reference it from code that runs
before deploy. It only exists post-provision.

## When you finish

Tell the human:

```
Lab '<name>' scaffolded. Next steps:
1. Review labs/<name>/lab.yaml and labs/<name>/infra/main.bicep
2. Implement scenario runners in labs/<name>/scripts/scenarios/
3. Validate: python labs/_platform/helpers/manifest.py validate labs/<name>/lab.yaml
4. Deploy: cd labs && ./lab.sh -Labs <name>
```

If you're a Copilot CLI user, the `lab-author` skill (in `.github/extensions/`)
wraps this whole flow with adaptive Q&A — use it instead of doing this manually.
118 changes: 118 additions & 0 deletions labs/LAUNCHER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Zava Unlimited

A growing collection of Azure SRE Agent demo labs, all deployable through one
launcher and breakable through one meta-simulator.

> Zava is a fictional retail conglomerate. Each subsidiary (Zava Power, Zava
> Athletic, Zava Cafe, Zava Eats, Zava IT Support, Zava Infra) is a
> self-contained lab that demonstrates a different Azure workload + SRE Agent
> autonomy story.

## TL;DR

```bash
./lab.sh # POSIX — pick one or more labs to deploy
pwsh ./lab.ps1 # Windows / cross-platform
./sim.sh # POSIX — pick a deployed lab + scenario to break/fix
pwsh ./sim.ps1
```

## Two top-level commands

| Command | Purpose |
|---|---|
| `lab.ps1` / `lab.sh` | Discover labs, prompt for inputs, run `azd up` |
| `sim.ps1` / `sim.sh` | Discover **deployed** labs, run break/fix scenarios |

## Currently shipping

| Lab | Subsidiary | Workload |
|---|---|---|
| `zava-power/` | Zava Power | ACA + ServiceNow utility ops, 8 scenarios |
| `zava-athletic/` | Zava Athletic | AKS + PostgreSQL e-commerce, 3 scenarios |
| `zava-cafe/` | Zava Cafe | App Service + Azure SQL specialty-coffee e-commerce |
| `zava-eats/` | Zava Eats | Starter lab — Grubify food-ordering sample, first break/fix |
| `zava-itsupport/` | Zava IT Support | ACA — IT helpdesk + ServiceNow MCP |
| `zava-infra/` | Zava Infra | 3 scenarios — tf-drift, perf-drift, compliance |

Run `pwsh ./lab.ps1 -List` for the live list.

## Authoring a new lab

Two paths:

1. **Conversational, in Copilot CLI:** install the `lab-author` skill (under
`.copilot/extensions/lab-author/`) and just say "Help me add a new lab to
Zava Unlimited". The skill will interview you and call the scaffolder.
2. **Manual / any AI assistant:** read `AGENTS.md` for the contract, then run
`pwsh ./lab.ps1 -New <kebab-name>` for the skeleton.

The platform is in `_platform/` — schema, helpers, template. Don't modify it
when adding a lab; just drop a new sibling directory.

## Multi-lab launcher (`lab.ps1`)

Interactive picker by default:

```
Which lab(s) do you want to deploy?
[1] zava-power ACA + ServiceNow utility-platform demo with 8 break/fix scenarios.
[2] zava-athletic AI-first AKS + PostgreSQL e-commerce demo with 3 break/fix scenarios.
[3] zava-eats Starter lab — Grubify food-ordering sample, first break/fix.
[a] all
[q] quit
```

Pick one, several (comma-separated), or `a` for all. Each lab gets its own
azd environment so they coexist cleanly.

### Non-interactive

```bash
./lab.sh -Labs zava-power # deploy one
./lab.sh -Labs zava-power,zava-athletic # deploy multiple
./lab.sh -List # list available labs
./lab.sh -Down zava-power # tear down
./lab.sh -New my-new-lab # scaffold a new lab
```

### Behavior

- **Single-lab deploy** auto-launches the simulator at the end of postprovision.
- **Multi-lab deploy** sets `LAB_NO_AUTOLAUNCH=1` so postprovision finishes
cleanly; launch sims manually after via `./sim.sh -Lab <name>`.
- Deploys run sequentially (azd serializes resource state anyway).

## Meta-simulator (`sim.ps1`)

After one or more labs are deployed (each writes
`.deployed/<lab>.json` from its post-provision), `sim` discovers them:

```bash
./sim.sh # interactive: pick a deployed lab
./sim.sh -List # list deployed labs + their scenarios
./sim.sh -Lab zava-power # run that lab's full sim UI
./sim.sh -Scenario zava-power db-outage # run one scenario directly
```

If only one lab is deployed, `sim` enters its UI directly. If multiple are
deployed, you get a picker that includes a unified "scenarios across all
labs" view.

## Anatomy of a lab

```
labs/<name>/
├── lab.yaml # manifest (schema in _platform/schema/)
├── azure.yaml # azd entrypoint w/ pre+postprovision hooks
├── infra/main.bicep # subscription-scoped IaC
├── scripts/
│ ├── check-environment.ps1 # preprovision: prereqs + prompts → azd env
│ ├── post-provision.ps1 # image build, srectl apply, write .deployed/
│ └── scenarios/<id>.ps1 # one runner per scenario in lab.yaml
├── simulator/ # (optional) lab's own rich UI
└── README.md
```

See `AGENTS.md` for the full contract and `_platform/schema/lab.example.yaml`
for an annotated manifest.
Loading