Skip to content

Determinism guarantees — byte-stable output for every emitter and command #74

@tonydspaniard

Description

@tonydspaniard

Goal

Make determinism a first-class quality of every framework command and emitter. The same inputs produce byte-identical outputs across runs, machines, and PHP versions. This is the foundation that lets agents iterate confidently — without it, agents see noise where there's no real change, and rewind/replay/check workflows become unreliable.

This is a tracking / specification issue, not a single deliverable. It defines the standard, lists the affected packages, and tracks compliance.

Why

Agents take diff output as a signal. If a command produces a different file on every invocation (random ordering, timestamps, generated comments), the agent assumes meaningful drift and tries to "fix" the non-issue. Mistaken edits cascade.

When everything is deterministic:

  • bin/altair manifest:generate twice = no change → agent moves on
  • bin/altair spec scaffold twice on same spec = same output → safe to re-run
  • bin/altair spec emit-openapi produces the same YAML byte-for-byte → CI can diff against committed copy
  • Test outputs are reproducible → flake detection actually works

The standard

A command/emitter is deterministic when, given the same project state and inputs, it produces:

  1. Byte-identical output files — same content, same line endings, same trailing newlines
  2. Stable ordering — anywhere arrays/sets/dicts are iterated, sort by a stable key (alphabetical, numeric ID, etc.) before emitting
  3. No wall-clock timestamps inside emitted content — except a single explicit generated_at field where one is needed
  4. No machine identifiers — no hostname, no username, no absolute paths
  5. No nondeterministic randomness — UUIDs in generated code use a stable deterministic source (e.g. spec SHA → uuid5) or are absent

Affected packages and current status

Package Commands/emitters Determinism status
univeros/agent-spec (#18) manifest:generate required for v1
univeros/scaffold (#19) spec scaffold, spec emit-openapi, spec emit-sdk required for v1
univeros/scaffold (#19) Generated PHP files (Actions, Inputs, Responders, tests) required for v1
univeros/persistence (#20) Entity emitter, migration emitter required for v1
univeros/messaging (#21) Job + handler emitter required for v1
SDK emitters (#22) TypeScript, Python output required for v1
bin/altair doctor (#23) --format=json output required for diffable check results
univeros/mcp (#24) All tool responses required for stable agent reasoning
Test reporter (#25) --format=json output required where the comparable parts of the report are equal
Introspection (#26) All --format=json outputs required

Implementation patterns

Stable ordering

// before
foreach ($container->getBindings() as $id => $binding) { ... }

// after
$bindings = $container->getBindings();
ksort($bindings);  // alphabetical
foreach ($bindings as $id => $binding) { ... }

Apply this anywhere we iterate maps/arrays whose iteration order is implementation-defined (PHP guarantees insertion order, but the insertion order itself is the bug if it depends on filesystem traversal — see next pattern).

Filesystem traversal

// before — order depends on inode order, which varies by FS
$files = scandir($dir);

// after
$files = scandir($dir);
sort($files, SORT_STRING);

Symfony\Component\Finder defaults to OS order — we wrap it with explicit ->sortByName() everywhere we use it for code generation.

Timestamps

Reserve a single explicit generated_at field per artifact when a timestamp is genuinely useful. Everywhere else: omit. Never inline gmdate(...) into the body of an emitted file's heredoc.

Random IDs

Where a UUID is needed (e.g. seed data for tests), derive it from the spec SHA:

$seedUuid = Uuid::uuid5(Uuid::NAMESPACE_OID, $spec->sha256 . ':' . $field);

UUIDv5 is content-addressed, so the same spec produces the same UUIDs forever.

Code-generation headers

Every generated file gets a header that includes only deterministic fields:

<?php

// @generated by Altair scaffold from api/users/create.yaml
// @spec-sha 4f3a8b…  (truncated to 12 chars)
// @scaffolder-version 1.2.0
//
// Edit this file directly to leave the spec contract; doing so will cause
// `bin/altair spec lint` to report drift. Prefer editing api/users/create.yaml
// and re-running `bin/altair spec scaffold`.

No timestamp, no machine name, no user.

CI enforcement

A new check on the framework's own CI: "determinism gate."

- name: Determinism gate
  run: |
    bin/altair manifest:generate
    bin/altair spec scaffold api/ --force
    bin/altair spec emit-openapi > /tmp/openapi-1.yaml
    git diff --exit-code .agent/ src/App/ docs/openapi/ \
      || (echo "Generated content differs after regeneration — non-determinism detected" && exit 1)

    bin/altair manifest:generate
    bin/altair spec scaffold api/ --force
    bin/altair spec emit-openapi > /tmp/openapi-2.yaml
    diff /tmp/openapi-1.yaml /tmp/openapi-2.yaml

If a PR breaks determinism, CI catches it. Same gate runs in the user's projects (skeleton ships with this workflow file).

Acceptance criteria

This issue is complete when:

  • Each of the affected packages has explicit determinism notes in its composer.json description and per-package README
  • Every shipped emitter passes a "run twice, diff = empty" test, baked into the package's PHPUnit suite
  • CI in univeros/framework runs the determinism gate on every PR
  • The skeleton (re #15 docs: Courier package guide #28) ships a determinism workflow that runs on user projects
  • bin/altair doctor includes a determinism_check (regenerate, diff, exit non-zero on drift)
  • Documentation in AGENT.md explicitly states the determinism standard for any new package added to the framework

Out of scope

  • Bit-reproducible builds across PHP minor versions (we accept that PHP 8.3 → 8.4 may produce different array sort orders for edge cases; we lock CI to one PHP minor and document)
  • Cross-platform output (we ship LF line endings everywhere; users on Windows can configure their editor — git already handles this with .gitattributes)
  • Cryptographic reproducibility / signed artifacts (separate, possible follow-up)

Dependencies

This is cross-cutting — it depends on every package that emits content (#18, #19, #20, #21, #22, #68, #69, #70, #71). Best treated as a standing label / acceptance criterion applied to each PR in those issues, rather than something to "complete" in one go. Track here, enforce in each PR.

Why this is the boring-but-load-bearing issue

The other issues add features. This one adds trust. An agent that can re-run any command without fear of accidentally introducing changes is dramatically faster than one that can't. Most frameworks fail this quietly — Symfony's makers, Laravel's artisan make — none are byte-stable. Owning this standard from day one is a small differentiator on its own.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions