feat: pharaoh-sdd skill for gated V-model spec-driven development#21
Merged
Conversation
Signed-off-by: Bartosz Burda <bartoszburda93@gmail.com>
Signed-off-by: Bartosz Burda <bartoszburda93@gmail.com>
- Terminal step now invokes pharaoh-quality-gate with project_root and run directory via self_review_coverage invariant; artefacts_summary_path marked optional because pharaoh-sdd runs no mece or coverage-gap tasks - persist step names exact review JSON filename conventions matched by pharaoh-self-review-coverage-check (<id>_review.json, <id>_diagram_review.json, <id>_code_grounding.json) - ubproject.toml added to Input section with note that [needs.links] is read from it for link-field convention; [pharaoh.traceability] references now clearly attributed to pharaoh.toml - data-access and strictness note added: concerns are delegated to atomic skills - needs.json output location clarified with typical demo path - implementation tier gains explicit sphinx-build -W rebuild and developer checkpoint after pharaoh-req-codelink-annotate, consistent with other tiers Signed-off-by: Bartosz Burda <bartoszburda93@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds a new non-atomic orchestrator skill (pharaoh-sdd) intended to enforce gated, spec-driven (V-model) development in sphinx-needs projects by requiring elicitation, per-tier review, sphinx-build -W, and human checkpoints, and exposes it via both Claude skill docs and a GitHub Copilot agent entry.
Changes:
- Added the
skills/pharaoh-sdd/SKILL.mdspecification describing the gated tier-by-tier orchestration process and quality gate terminal step. - Documented the new entry point in
README.mdand.github/copilot-instructions.md. - Added the Copilot agent descriptor
.github/agents/pharaoh.sdd.agent.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| skills/pharaoh-sdd/SKILL.md | Defines the new orchestrator’s phases, gating rules, tier loop, and terminal quality gate invocation. |
| README.md | Adds pharaoh-sdd to the “Core workflow” skills list and updates surrounding description text. |
| .github/copilot-instructions.md | Adds @pharaoh.sdd to the “Available Agents” table. |
| .github/agents/pharaoh.sdd.agent.md | Introduces the Copilot agent entry for the new @pharaoh.sdd orchestrator. |
| | draft | Dispatch the tier's atomic draft skill once per artefact. The draft skill self-invokes its review and returns `{artefact, review}`. | | ||
| | evaluate | Read the attached review. If `overall: fail`, `overall: needs_work`, or any binary axis has `score: 0`, re-dispatch the draft skill with the review action items folded into the description. Use `pharaoh-req-regenerate` for requirements, re-invoke the draft skill directly for arch and vplan. | | ||
| | normalise | If the project traces with a generic link field (read from `ubproject.toml [needs.links]` and the existing corpus), rewrite the drafted directive's typed link option (`:satisfies:` or `:verifies:`) to that field. If `ubproject.toml` is absent or has no `[needs.links]` table, keep the typed link as-is. | | ||
| | persist | Write the artefact into the docs tree. Write the review JSON into the run directory using the filename convention `<id>_review.json` (for req-review), `<id>_arch_review.json` (for arch-review), and `<id>_vplan_review.json` (for vplan-review). These names are matched by `pharaoh-self-review-coverage-check`. | |
Comment on lines
+117
to
+123
| Run `pharaoh-quality-gate` passing `project_root` and the run directory (via | ||
| `gate_spec.invariants.self_review_coverage`). The `self_review_coverage` invariant reads the | ||
| run directory directly and confirms every drafted artefact has a matching review JSON on | ||
| disk. `artefacts_summary_path` is optional and may be omitted here because the pharaoh-sdd | ||
| chain runs draft-and-review per tier but does not run `pharaoh-mece` or | ||
| `pharaoh-coverage-gap`. The deliverable is a V-model graph where every tier traces to the | ||
| next, every artefact has a review on disk, and the build is clean. |
patdhlk
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
pharaoh-sdd, a process-orchestrator skill that drives a feature from a conversation through the full V-model: requirements, architecture, tests, and implementation. Each stage produces traceable sphinx-needs artefacts.pharaoh-sddis a non-atomic orchestrator. It does not draft or review artefacts itself. It runs an interactive elicitation phase, then walks the project's V-model tiers, dispatching the existing atomic draft skills (pharaoh-req-draft,pharaoh-arch-draft,pharaoh-vplan-draft), and enforces a human checkpoint plus asphinx-build -Wvalidation after each tier. It closes withpharaoh-quality-gate.It exists to stop a specific failure. Handed "add feature X, do spec-driven development", an unaided agent fabricates the requirements (invented thresholds and acceptance criteria), runs the whole chain unattended, reviews nothing, and builds without
-W.pharaoh-sddmakes elicitation, per-tier review, validation, and a human gate mandatory.Tier order is derived from the project's tailoring rather than hardcoded, so the skill adapts to any sphinx-needs V-model.
Checklist
tests/fixtures/basic-project/copilot-instructions.mdupdated (if new/renamed agent)How to test this
With the Pharaoh skills from this branch available to your agent, work in a sphinx-needs project that has
.pharaoh/project/tailoring (tests/fixtures/basic-project/in this repo, or a fuller project such assphinx-needs-demo).pharaoh-sddand run Phase 0 elicitation. It should ask you for every unstated value rather than inventing one. If it drafts a requirement with a made-up number, that is a failure.sphinx-build -Wthat must pass, and a checkpoint that waits for your approval before the next tier.pharaoh-quality-gateto run and the V-model graph to build clean with every tier linked to the next.What to stress: step 4 onward (drafting through to the quality gate) has not been run end to end, so a full tier-by-tier walk on a real project is the main thing to exercise. Report any tier where the draft, review, build, or checkpoint does not behave as the skill describes.
Testing done so far
Validated by running the spec-driven-development scenario on a sphinx-needs V-model project (
sphinx-needs-demo), both with and without the skill, on the same feature request.Without the skill, the agent fabricated the unstated requirement values (light thresholds and hysteresis) and proceeded straight to drafting. With the skill, the agent instead asked the developer for those exact values and stopped at the Phase 0 elicitation gate, drafting nothing until the developer answers.
The repo fixture
tests/fixtures/basic-project/was not used. Validation was againstsphinx-needs-demo, which has a fuller multi-tier V-model.