Releases · markmhendrickson/writ

Summary

WRIT now benchmarks the failure modes that actually break long-lived AI memory in production: source-authority overwrites, extraction drift, system-level write failures, fact lifecycle transitions, and pre-delivery integrity checks.

Why This Release Exists

This release expands WRIT in direct response to failure modes surfaced in the r/AIMemory discussion "No AI memory benchmark tests what actually breaks." That discussion highlighted five recurring gaps in real-world memory systems: lower-authority writes overwriting user-stated facts, extraction drift creating near-duplicate records, flush/restart failures corrupting state, facts lacking explicit superseded/expired lifecycle handling, and systems returning stale or conflicting state without certifying integrity before delivery.

Discussion sources (Reddit):

Related write-up (same title, off-Reddit mirror): No AI memory benchmark tests what actually breaks.

These failure classes map directly to the new WRIT dimensions:

User correction overwritten by later summaries -> trust_hierarchy
Same fact re-extracted in slightly different forms -> extraction_drift
Flush/reset/stale-context corruption -> failure_injection
Superseded, expired, and reinstated facts -> lifecycle
Detect stale/conflicting state before answering -> certification

What changed for WRIT users

Added 5 new benchmark dimensions: trust_hierarchy, extraction_drift, failure_injection, lifecycle, and certification.
Added 25 new scenarios across those dimensions, expanding the benchmark dataset from 52 to 77 scenarios.
Added closure coverage to capture resolved-vs-discussed state, including superseded policy and pricing decision scenarios.
Expanded aggregate reporting with new metrics for source authority integrity, dedup accuracy, failure resilience, lifecycle accuracy, and pre-delivery detection.

API surface and contracts

Extended ScenarioCategory, RequiredCapability, FailureMode, ScenarioScores, and AggregateScores in the TypeScript API.
Added source_authority on memory events.
Added lifecycle_history, expected_entity_count, and expected_integrity_flag to scenario ground truth.
Extended adapter capabilities with support declarations for source authority, deduplication, lifecycle tracking, and pre-delivery certification.

Behavior changes

The benchmark can now distinguish retrieval failures from write-authority failures, dedup failures, lifecycle blindness, and certification misses.
Markdown and JSON reports now surface the new aggregate metrics and category-level breakdowns.
Scenario validation now recognizes the new categories and enforces category-specific structural requirements.

Docs site & CI / tooling

Updated WRIT docs for authoring, metrics, and adapter implementation to cover the new dimensions and scenario fields.
Fixed a CLI import side effect where report.ts executed on module import, which broke the GitHub Actions benchmark-baseline workflow when cli.ts imported generateMarkdownReport.

Internal changes

Refactored evaluator scoring and aggregation to support dimension-specific metrics while preserving null-skipping behavior for unsupported adapter capabilities.
Updated built-in adapters to advertise the extended capability surface.

Fixes

Fixed the benchmark CLI / report interaction so npx tsx src/cli.ts --adapter baseline ... no longer crashes by treating --adapter as a results directory.

Tests and validation

npx tsc --noEmit
npx vitest run
Local reproduction of the baseline benchmark CLI run
Local verification that standalone src/report.ts CLI still works after the entry-point guard fix

Breaking changes

None, but custom adapters and any code that exhaustively matches scenario categories, required capabilities, failure modes, or aggregate score keys must be updated for the expanded WRIT type surface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Summary

Why This Release Exists

What changed for WRIT users

API surface and contracts

Behavior changes

Docs site & CI / tooling

Internal changes

Fixes

Tests and validation

Breaking changes

Uh oh!

Releases: markmhendrickson/writ

v0.2.0

Summary

Why This Release Exists

What changed for WRIT users

API surface and contracts

Behavior changes

Docs site & CI / tooling

Internal changes

Fixes

Tests and validation

Breaking changes

Uh oh!