Skip to content

ahmedanees-m/pen-stack

Repository files navigation

PEN-STACK

A verification and grounding layer for genome-writing AI

Foundation models generate candidate edits; PEN-STACK checks them. It tells you where in the genome a write can be made safely and durably, which enzyme can make it, and how to design the write end to end. Every design is checked against rule-grounded mechanism, returned with calibrated confidence and its provenance, and marked "out of scope" rather than guessed. Numbers come from validated tools, not from a language model.

PyPI CI Publish codecov License: MIT Python 3.11+ Docker

Overview

PEN-STACK is an installable Python package and service for the genome-writing era, the modality that installs new information into a genome (inserting genes, flipping or excising kilobases, placing programmable landing pads) rather than editing a base in place. Writing is harder and less tooled than editing, and it is gated by questions that have no canonical answer: where can you write, what can write there, and how should the write be designed.

The package consolidates five earlier research projects into one citable stack and adds the reference maps and the design engine the field was missing. It runs on a single GPU, uses bulk-downloadable public data, and is validated against pre-registered baselines.

See the CHANGELOG for release history and docs/ for the full documentation.

What it addresses

Question The situation today What PEN-STACK provides
Where can you write? Labs re-derive ad-hoc "safe harbour" shortlists from inconsistent criteria; published lists range from thousands of sites to a few dozen, rarely predict expression durability, and usually cover one cell type. The Writable Genome: a learned, cell-type-aware, writer-aware atlas scoring every locus for safety (genotoxicity risk), durability (whether a cassette stays expressed), and reachability (which enzyme can engage it).
What can write there, and how well? Enzyme capabilities are scattered across papers, with no catalogue placing the genome-writing families on common measured axes with their targeting requirements. The Writer Atlas: 33,370 enzyme systems across 8 families on common measured axes, joined to the Writable Genome by a bidirectional cross-link.
How do I design the write? Destination, enzyme, cargo, and delivery are interdependent and goal-dependent, and no tool optimises them together. The Write Planner: inverse design that, given a goal and an edit intent, returns ranked, traceable site, writer, cargo, and delivery plans.
Where might a bridge-recombinase design go off-target? Bridge recombinases are highly programmable but had no genome-wide off-target screening tool. A bridge off-target engine that nominates and ranks candidate off-target locations from measured data (a screen, not a per-site risk calculator).

Architecture

A goal enters through one of the interfaces. The agent layer turns it into candidate designs. Every candidate passes through the verifier, the central gate: it runs a biosecurity screen first, checks the design against the rule base, attaches a calibrated confidence and a per-axis immune-risk profile, and discards anything that is unsafe, illegal, or uncalibrated. The reference layers and the oracle mesh supply the grounded answers the verifier and planner rely on, and everything rests on public data. No value is reported without a traceable source.

flowchart TB
    subgraph interfaces ["Interfaces"]
        direction LR
        cli["CLI"]
        rest["REST API"]
        mcp["MCP server"]
        web["Web app"]
    end

    subgraph agent ["Agent layer"]
        direction LR
        cosci["Co-scientist"]
        planner["Write planner"]
        expdesign["Experiment designer"]
    end

    verifier{{"Verifier (central gate): biosecurity screen, legality rules, calibrated confidence, per-axis immune-risk profile"}}

    subgraph refs ["Reference and model layers"]
        direction LR
        wgenome["Writable Genome: per-locus safety, durability, reachability, with bridge off-target screening"]
        watlas["Writer Atlas: 33,370 enzyme systems across 8 families, cross-linked to loci"]
        oracles["Oracle mesh: AlphaFold3, Evo2, AlphaGenome, ESM3, RFdiffusion, ProteinMPNN under one contract"]
    end

    data[("Public data: hg38, ENCODE and Roadmap chromatin, TRIP, COSMIC, VISDB, UniProt, Pfam, serosurveys, Perry 2025 bridge data")]

    interfaces --> agent
    agent --> verifier
    verifier --> refs
    refs --> data
Loading

Components

Component Module What it does
Writable Genome pen_stack.wgenome Learned per-locus safety, durability, and reachability; 3D structural-risk axis; bridge off-target screening.
Writer Atlas pen_stack.atlas, .mech, .score Cross-family enzyme catalogue and Writer-Targeting knowledge base, cross-linked to loci.
Write Planner pen_stack.planner Inverse design conditioned on an edit intent, including the delivery palette and the delivery-immunology profile.
Verifier pen_stack.verify, pen_stack.rules verify(design) returns legality, biosecurity verdict, calibrated confidence, and the immune-risk profile as distinct axes.
Biosecurity gate pen_stack.safety A dual-use screening gate that runs first in verify(); a refusal short-circuits scoring, with a tamper-evident audit trail.
Oracle mesh pen_stack.oracles One OracleResult contract over the biomolecular foundation models, with provenance, native uncertainty, and a scope card; generated output is a candidate, out-of-distribution inputs are flagged.
World-model graph pen_stack.graph A typed, provenanced knowledge graph with a gated, propose-only update loop.
Generative designer pen_stack.design Proposes candidate writing systems and keeps only those that pass safety, legality, and calibration, returning a Pareto frontier.
Digital twin pen_stack.twin Calibrated, out-of-distribution-gated outcome prediction, bounded at phenotype.
Experiment designer pen_stack.active Active learning by expected information gain, with a retrospective active-versus-random evaluation.
Build interface pen_stack.build Safety-gated protocol export (draft only, never auto-run) and gated ingestion of results.
Closed loop pen_stack.loop A gated design, build, test, learn loop with drift detection and versioned, reversible recalibration.
Agent and co-scientist pen_stack.agent Goal to cited, auditable plan; MCP server; the co-scientist that drives the loop.
Bridge off-target engine pen_stack.bridge Off-target nomination and guide QC for bridge recombinases.
Interfaces pen_stack.server, pen_stack.web, pen_stack.ui, pen_stack.cli REST API, web application, and command-line tools.

Key results

All results are blind and pre-registered (success criteria, baselines, and held-out sets are SHA-locked in prereg/ before any model sees the test data). Estimates are reported with their sample size and confidence interval.

  • Writable Genome. A genome-wide atlas of 3,031,030 loci across 3 cell types (K562, HepG2, CD34+ HSPC) recovers validated safe harbours as highly writable and clinical genotoxic loci as non-writable, blind. Durability transfers from mouse to human (Spearman rho 0.42).
  • Writer Atlas. 33,370 enzyme systems across 8 families on common measured axes; the mechanism classifier agrees with the audited labels on the curated core (1.00); the cross-link is validated on AAVS1.
  • Write Planner. Run genome-wide so that no on-target identity term fires, the planner's writability ranks held-out safe harbours above matched-context controls. On a gold set of 16 loci (8 functionally validated, 8 computationally defined) this is a weak, bounded signal: all-loci AUROC 0.68 (95% CI 0.53 to 0.82), validated-only 0.70 (95% CI 0.48 to 0.91, underpowered at N=8) against a safety-only baseline of 0.51. Writer-family recovery at rank 1 is 0.86 against a prevalence of 0.29 across 4 families.
  • Bridge off-target engine. To our knowledge the first measured-data-validated tool that nominates and ranks candidate off-target locations for bridge recombinases. On the Perry 2025 data (6,856 measured off-targets) the per-position profile confirms the central core (positions 7 to 9) as the specificity determinant, and the model ranks real off-targets above core-disrupted decoys at AUROC 0.77 against 0.62 for Hamming distance. It is a screening tool, not a quantitative safety calculator: it does not quantify how much recombination occurs at each site.

Installation

From PyPI (the library, CLI, agent, and pure-logic tools):

pip install pen-stack                                          # core
pip install "pen-stack[models,bio,bridge,server,services]"     # full stack

The wheel ships the importable package and the command-line tools. The full data pipeline (the multi-million-row atlases, BigWig tracks, and curated configs) is distributed via the cloned repository and Zenodo. Most users who want the whole pipeline clone the repository:

git clone https://github.com/ahmedanees-m/pen-stack.git && cd pen-stack
pip install -e ".[dev]"                                        # core and tests
pip install -e ".[models,bio,bridge,server,services]"          # full stack
pytest -q
pen-stack info                                                 # stack status
python bench/run.py --agent                                    # run the Genome-Writing Bench

Quick start

Query the stack from the command line:

pen-stack atlas --coverage                                     # Writer Atlas coverage (33,370 systems, 8 families)
pen-stack writable --gene CCR5 --ct k562                       # rank writable loci near a gene
pen-stack crosslink --chrom chr19 --bin 55090                  # which writers reach AAVS1
pen-stack plan --gene TRAC --intent knock_in_with_disruption --cargo-bp 2000
pen-bridge design --target ACGTGTCTACGTGA --donor TTGCATCTAGGCAC

Self-host the whole platform (API, web app, agent, MCP, LLM) with one command:

docker compose up -d
docker compose exec ollama ollama pull qwen2.5:7b-instruct     # first run only (local fallback model)
# Web app on :8501, API on :8000, MCP on :8765 (see docs/DEPLOY.md)

The LLM backend is optional and non-load-bearing: it narrates and routes, but every number and citation comes from a validated tool, so the core scientific compute runs with no language model at all. See docs/DEPLOY.md.

Benchmarks

The Genome-Writing Bench is a one-command, SHA-locked benchmark for the writing side of genome engineering: where to write, what writer to use, how to design the cargo, and what off-target or structural risk a write carries. Each task has a deterministic scorer and a documented ground-truth source, and no task is scored against a circular label.

python bench/run.py --agent
docker compose run --rm bench python bench/run.py --agent       # on the clean image

The deterministic planner beats the naive baselines on the grounded tasks; a tool-using LLM agent reaches the planner's numbers only by grounding every value (zero fabricated), while the same models with no tools fabricate the tool-only fields. See benchmarks/genome_writing_bench/. The held-out public leaderboard is the Genome-Writing Challenge.

Built on prior repositories

PEN-STACK consolidates and re-grounds five earlier projects. Their reusable assets are imported here; the originals are archived read-only for provenance and DOI stability.

Repository Pinned version What is reused What changed
genome-atlas v0.7.2 The audited 18-family Pfam backbone behind the knowledge base and the at-scale mechanism classifier. The GraphSAGE link-prediction framing was retired.
mech-class v0.5.4 The mechanism classifier (Pfam, RHEA, CRISPRcasdb, UniProt). Reused as the family and mechanism caller.
pen-score v0.1.3 The scoring axes (delivery, immunogenicity, cargo, and others). The cargo and programmability axes were re-grounded; hand-set overrides removed.
pen-assemble v0.5.2 The ortholog sequence set. De-novo chimera generation was replaced by DMS-grounded point-variant proposal.
pen-compare v0.1.0 The 1,058-entity universe, scorecard scaffold, and tests. The circular five-gate certification became a descriptive scorecard with blind concordance.

A single assembly path (pen_stack/atlas/universe.py) feeds the classifier, the scorer, and the scorecard the same metadata, so cross-module inconsistencies cannot recur.

Repository structure

pen-stack/
  pen_stack/            the installable package
    wgenome/            Writable Genome: features, safety, durability, writability, uncertainty, structure3d
    atlas/              Writer Atlas, knowledge base, cross-link, variant proposal, canonical universe
    mech/  score/       mechanism classification at scale; re-grounded therapeutic-readiness axes
    planner/            Write Planner: optimisation, cargo, routing, delivery palette, delivery immunology
    bridge/             bridge off-target engine and guide QC
    oracles/            oracle mesh: the OracleResult contract and adapters over the foundation models
    graph/              living world-model knowledge graph (gated, propose-only)
    rules/  verify/     machine-readable rule base and the verify(design) service
    safety/             the biosecurity and dual-use gate
    design/  twin/      generative designer; calibrated digital twin
    active/  build/     experiment designer; safety-gated build interface
    loop/               the gated design-build-test-learn loop
    agent/              agent platform, co-scientist, MCP server
    api/  web/  server/ ui/  cli.py    AI integration surface, web platform, REST API, CLI
  benchmarks/           Genome-Writing Bench and the public Genome-Writing Challenge
  scripts/              reproducible pipeline drivers
  configs/              pinned datasets, thresholds, and curation (YAML)
  prereg/               SHA-locked success criteria
  data/curated/         small committed tables
  tests/                unit, regression, and blind-validation suite
  docs/                 documentation site
  docker/               container images and pinned requirements
  docker-compose.yml    one-command self-hostable platform

Large artifacts (multi-million-row atlases, BigWig tracks, models) and any third-party copyrighted data are not committed. They are released via Zenodo or fetched from the original source and are reproducible by re-running the scripts. Only small curated tables and derived products live in git.

Data sources

All public: hg38 (UCSC); ENCODE and Roadmap chromatin (ATAC/DNase and histone marks for K562, HepG2, CD34+ progenitor, mouse ES-Bruce4); GENCODE v46; COSMIC Cancer Gene Census v104; DepMap Public 26Q1; LaFave 2014 MLV integrations; VISDB; TRIP (Akhtar 2013, GEO GSE49806/49807); UniProt orthologs; Pfam and InterPro; Europe PMC; Addgene; and the Perry 2025 bridge-recombinase off-target and DMS data (copyrighted, kept local, only derived products released). Every accession and DOI is pinned in configs/datasets.yaml and independently verified.

Validation approach

  • Pre-register before training: success criteria, baselines, and held-out sets are SHA-locked in prereg/ before any model sees test data.
  • Always report a baseline (oncogene distance for safety, H3K9me3/LAD for durability, intent-blind ranking for the planner, Hamming distance for the bridge engine).
  • Blind external concordance: recover validated safe harbours, clinical genotoxic loci, documented writes, and measured off-targets the model never trained on.
  • Report failure: cross-cell-type degradation, small benchmark sizes, and the limits of sequence-only off-target magnitude prediction are reported as results, not footnotes.
  • Every estimate carries its sample size and confidence interval. The validated gold sets are small, and statistical power is a stated limitation; scaling them is the top priority for turning the proof of concept into an adopted resource.
  • Grounded services: every quantitative answer comes from a validated tool call, never a language model; the living database never auto-edits the atlas; clinical directives are refused.

Citation

@software{penstack2026,
  author  = {Mahaboob Ali, Anees Ahmed},
  title   = {PEN-STACK: open infrastructure for genome writing (The Writable Genome)},
  year    = {2026},
  version = {6.11.0},
  url     = {https://github.com/ahmedanees-m/pen-stack}
}

Author: Anees Ahmed Mahaboob Ali, VIT University, Vellore. MIT licensed.

Decision-support, not a clinical directive. Every score is traceable to public data and a pre-registered model.

About

Open infrastructure for genome writing: the Writable Genome (learned safe+durable insertion-site atlas), the Writer Atlas of 33k genome-writing enzymes, an inverse-design Write Planner, and a bridge-recombinase off-target engine. Consolidates 5 prior repos. Pre-registered, blind-validated, single install.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors