feat: eval consolidation, nested markers, and AAP spec updates by urmzd · Pull Request #2 · urmzd/generative-artifact-protocol

urmzd · 2026-04-03T08:13:43Z

Summary

docs(aap): Split specification into init and maintain phases; add meta field and clearer envelope structure
feat(markers): Support nested target markers with depth counting
feat(apply): Use inclusive range for delete operations
build/test(evals): Update Google provider default, split spec loading, rebuild Rust Python extension, consolidate experiment outputs
fix: Clean up broken outputs and update dependency versions

Test plan

Verify eval experiments still run correctly with consolidated outputs
Test nested marker parsing with depth counting
Confirm inclusive range delete operations work as expected
Review AAP spec changes for correctness

Separate AAP specification into two focused documents: - aap-spec-init.md: guidance for generating markup with target markers during artifact creation - aap-spec-maintain.md: guidance for responding with edit envelopes during artifact maintenance This improves clarity on agent responsibilities based on the artifact lifecycle phase.

Update main AAP specification to reflect metadata structure changes. The operation field is replaced with meta object to better organize format and other metadata. This aligns with the actual implementation and makes the envelope structure clearer for edit operations.

Implement proper handling of nested <aap:target> markers by tracking nesting depth rather than matching the first closing tag. This allows targets to safely contain other targets. Adds find_matching_close() function to locate the correct closing tag when multiple targets are nested. Applies to both Rust (src/markers.rs) and Python (evals/src/aap_evals/markers.py) implementations. Includes additional test for nested target outer extraction.

Refactor apply_edit to handle delete operations differently from other operations. Delete now uses inclusive range (markers included) while replace/insert operations use exclusive range (markers excluded). Add find_by_id_inclusive() to Resolve trait and TextResolver to support this distinction. Add resolve_target_inclusive() helper function. Restructure operation matching to split delete from other op types. Update test assertion to reflect that delete removes both content and markers completely.

Update evals configuration: - Change default Google provider from gemini-3.1-flash-lite-preview to gemini-2.5-flash - Load separated AAP specification files (init and maintain) instead of single spec file - Pass appropriate spec to each agent based on task phase This aligns with the split AAP specification and uses more current model defaults.

Rebuild compiled Python extension following updates to marker handling logic. Recompilation captures nested target marker support.

Clean up experiment data directory by removing old evaluation results and intermediate outputs. Consolidate experiment runs, keeping only the final turn outputs and updated HTML artifacts. Removes eval.json and metrics.json files for experiments that have been consolidated. Removes intermediate turn outputs (turns 1-4 for aap, turns 3-4 for base runs) that are no longer needed. Updates remaining artifacts with new evaluation results.

Update Cargo.lock with resolved dependency versions following changes to project dependencies.

urmzd added 9 commits April 3, 2026 03:12

build(evals): rebuild rust python extension after marker changes

32bd03c

Rebuild compiled Python extension following updates to marker handling logic. Recompilation captures nested target marker support.

build(deps): update cargo.lock with new dependency versions

8e233fe

Update Cargo.lock with resolved dependency versions following changes to project dependencies.

fix: cleanup broken outputs

012ca3a

urmzd merged commit d1158a3 into main Apr 3, 2026
1 check passed

urmzd deleted the feat/evals-and-markers branch April 3, 2026 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: eval consolidation, nested markers, and AAP spec updates#2

feat: eval consolidation, nested markers, and AAP spec updates#2
urmzd merged 9 commits intomainfrom
feat/evals-and-markers

urmzd commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

urmzd commented Apr 3, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant