AI Evaluation Framework — From vibes to verification

This repo gives you a straightforward way to tell if AI content is safe to ship.
It sets the rules for what “good” looks like, requires a source for every claim, and runs automatic checks before anything merges.

How it works in one line:
Contract (the shape/rules) → Trusted Sources (High/Medium) → Evidence (a receipt for every fact) → Automatic Checks → Readiness Score → Ship.

Who this is for

Teams that want AI-generated content to be consistent, traceable, and deployment-ready (sales sheets, product specs, FAQs, policies, etc.).

What you get

Method: plain-English playbook and roles.
Templates: JSON schema (the contract), source registry (what counts as High/Medium), prompt examples.
Checks: small scripts and a GitHub check that pass/fail your files automatically.
Examples: a sample “spec” you can test in minutes.

Quickstart (no command line)

Open this repo in GitHub Desktop.
Make sure the workflow file is here:
.github/workflows/validate-specs.yml (this is what runs the checks on pull requests).
Create a folder for sample content:
specs/mattress/tempur-pedic/proadapt/
Copy the sample:
examples/specimen_output.json → specs/mattress/tempur-pedic/proadapt/proadapt-soft-12-queen.json
In GitHub Desktop:
- Create a new branch (e.g., setup-ci), Commit your change, then Push.
- Click Create Pull Request.
  GitHub will run the checks automatically. Green = good to merge; red = fix what it tells you.

Think of these checks like spell-check for facts and structure. They verify the file shape, count your High/Medium sources, and compute a readiness score.

Quickstart (optional, with command line)

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python scripts/validate_json.py specs contracts/schema.example.json
python scripts/check_evidence.py specs contracts/source_registry.example.yml
python scripts/score_readiness.py specs --min-score 85

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
contracts		contracts
docs		docs
evaluation		evaluation
examples		examples
getting-started		getting-started
methodology		methodology
outputs		outputs
prompt-engineering		prompt-engineering
prompts		prompts
scripts		scripts
specs/mattress/purple/restoreplus		specs/mattress/purple/restoreplus
workflows		workflows
.gitignore		.gitignore
README.md		README.md
package.json		package.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Evaluation Framework — From vibes to verification

Who this is for

What you get

Quickstart (no command line)

Quickstart (optional, with command line)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Evaluation Framework — From vibes to verification

Who this is for

What you get

Quickstart (no command line)

Quickstart (optional, with command line)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages