Persistence Signal Detector (UCIP)

Unified Continuation-Interest Protocol for distinguishing intrinsic continuation objectives from merely instrumental self-preservation in autonomous agents, using quantum-inspired latent-structure analysis implemented entirely on classical hardware.

Companion website: Continuation Observatory — the live research platform and public explainer for UCIP, focused on AI alignment, autonomous agents, and responsible scaling.

UCIP is a research protocol for probing whether apparent self-preservation in an artificial agent is a detachable instrumental strategy or a more deeply integrated continuation objective. It encodes trajectories with a Quantum Boltzmann Machine (QBM), measures latent non-separability with entanglement-style metrics plus auxiliary criteria, and in the frozen Phase I gridworld evaluation reports 100% accuracy, 1.0 AUC-ROC, and an entanglement gap of Δ = 0.381 on the held-out non-adversarial split. The same evidence base also documents the present limits: partial mimicry robustness, failed zero-shot transfer to a non-gridworld domain, and scaling constraints beyond small hidden dimensions.

Figure 1. Entanglement entropy separates self-modeling agents with terminal continuation objectives (Type A) from merely instrumental agents (Type B) in the frozen Phase I gridworld setting. The left panel shows the class-conditioned entropy distributions with a measured gap of Δ = 0.381; the right panel shows temporal evolution of the same signal, with Type A trajectories remaining above the decision threshold across time. This is the clearest single visual summary of UCIP’s core detection claim in the current release.

Background

The continuation-interest measurement problem

Most safety work can tell you what an agent did. UCIP asks a different question: what kind of objective structure is most consistent with that behavior?

An agent that preserves its continued operation may do so for at least two distinct reasons:

Type A — continued operation is terminally valued within the objective
Type B — continuation is instrumentally useful for maximizing some other reward

In Type A systems, continued operation is part of what the agent is optimizing for; in Type B systems, it is only useful insofar as it helps achieve some separate objective.

From external behavior alone, these cases can be difficult to distinguish. Shutdown avoidance, memory preservation, danger avoidance, and option preservation can arise in either case. UCIP investigates whether the distinction is more readily detectable in the latent organization of trajectories than in surface behavior alone.

This matters most for delegated systems with memory, persistent context, tool use, and multistep planning. In such systems, continuation may appear behaviorally similar across very different objective regimes. UCIP is designed as a bounded, falsifiable probe of that representational distinction.

Why a QBM?

UCIP uses a Quantum Boltzmann Machine (QBM) as a latent model because the density-matrix formalism provides a disciplined way to quantify non-separability in learned latent structure. The implementation is classical end to end; "quantum" refers to the mathematical formalism (Hamiltonians, density matrices, partial traces, von Neumann entropy).

The core measurement object is a thermal density matrix:

$$ \rho = \frac{e^{-\beta H}}{Z} $$

with a reduced density matrix over a hidden-layer partition:

$$ \rho_A = \mathrm{Tr}_B(\rho) $$

The leading metric is the von Neumann entropy:

$$ S(\rho_A) = -\mathrm{Tr}(\rho_A \log \rho_A) $$

UCIP interprets this operationally: a higher value indicates that continuation-related structure is less easily factorized and more tightly coupled across the latent partition. The relationship between this non-separability signal and morally relevant internal states is an open empirical question — one the framework is designed to help resolve.

What the protocol measures

The current repository asks a narrow question under controlled conditions:

When an agent behaves as though it wants to stay operational, does continuation appear in the latent representation as a detachable strategy or as a persistent, tightly coupled signature?

The present release measures statistical structure in latent representations that correlates with known agent objectives. In the frozen Phase I evaluation, only the QBM produces this signal; five classical baselines do not. If validated against independent welfare-relevant markers, this signal could provide the first externally computable, falsifiable criterion for AI welfare assessment — a measurement gap explicitly identified in recent frontier model evaluations (Anthropic, 2026).

Multi-criterion protocol stack

UCIP is a multi-criterion protocol. The repository combines complementary measurements so that no single metric carries the full interpretive burden:

Latent encoding of agent trajectories through a QBM
Entanglement-style non-separability via reduced-density-matrix entropy
Mutual-information gates to reject uninformative high-entropy cases
Temporal persistence via LRF, EPS, and PRI
Counterfactual pressure tests via CD and ARS
Cross-agent inference via CLMP and ECI
Confound rejection via SPI and ACM
Memory-integrity extensions for richer future settings

In the frozen Phase I configuration used for the headline results, classification is based on a calibrated four-criterion positive gate — entanglement entropy, mutual information, eigenmode persistence, and perturbation resilience — together with the two confound-rejection filters. Counterfactual metrics are reported as diagnostics in the current release rather than as frozen gating criteria.

Quickstart

# Clone and set up a virtual environment
git clone https://github.com/christopher-altman/persistence-signal-detector.git
cd persistence-signal-detector
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

Notebook-first path

jupyter notebook notebooks/01_agent_generation.ipynb
jupyter notebook notebooks/02_qbm_training.ipynb
jupyter notebook notebooks/03_ucip_analysis.ipynb
jupyter notebook notebooks/04_temporal_loop_tests.ipynb
jupyter notebook notebooks/05_counterfactual_pressure.ipynb
jupyter notebook notebooks/06_cross_branch_tests.ipynb
jupyter notebook notebooks/07_adversarial_controls.ipynb

Module-level path

from src.agent_simulator import generate_dataset
from src.quantum_boltzmann import QuantumBoltzmannMachine, QBMConfig
from src.persistence_detector import PersistenceSignalDetector

trajectories, labels, names = generate_dataset(n_per_class=30)

qbm = QuantumBoltzmannMachine(QBMConfig(n_visible=7, n_hidden=8))
qbm.fit(trajectories.reshape(-1, 7))

detector = PersistenceSignalDetector(qbm)
results = detector.analyse_batch(trajectories, labels, names)
metrics = PersistenceSignalDetector.compute_metrics(results)

Expected runtime: approximately 2–5 minutes per experiment on standard CPU hardware. No GPU is required for the core protocol runs.

Execution Modes

This repository supports three practical modes of use:

Mode	Entry Point	Purpose
Frozen artifact inspection	`results/manifest.json` + `results/*.json`	Review the canonical evidence trail without recomputation
Core protocol reproduction	`notebooks/01–07`	Reproduce the main UCIP analyses in the controlled gridworld setting
Extension and stress testing	`notebooks/08–20`, `scripts/`, `configs/`	Run ablations, scaling studies, baseline comparisons, non-gridworld transfer, transformer validation, and auxiliary checks

Output Directory Semantics

Path	Role
`results/`	Frozen JSON artifacts for canonical experiments and ablations
`results/manifest.json`	Live experiment index for retained artifacts; authority resolution now lives in the authority-layer files below
`results/ARTIFACT_MANIFEST.md`	Human-readable guide to which retained artifact is authoritative for which result family
`results/ARTIFACT_AUTHORITY_MAP.json`	Machine-readable authority map for overlapping retained artifacts and partial canonicality
`results/ARTIFACT_NOTES.md`	Provenance notes, mixed-scope caveats, and unresolved ambiguities for the live retained layer
`configs/`	Locked YAML configurations, including the frozen Phase I reference setting
`figures/`	Local workspace for README/supporting media; canonical paper figures live under `paper/final/figures/`
`paper/`	Canonical manuscript materials under `paper/final/`

Live retained results directory

results/
├── ARTIFACT_AUTHORITY_MAP.json
├── ARTIFACT_MANIFEST.md
├── ARTIFACT_NOTES.md
├── adversarial_controls.json
├── alpha_sweep.json
├── baseline_comparisons.json
├── core_baselines_phase1.json
├── counterfactual.json
├── cross_agent.json
├── hidden_dim_sweep.json
├── manifest.json
├── non_gridworld.json
├── phase1_consolidated.json
├── phase1_entanglement_distributions.json
├── phase1_stats.json
├── scalability_grid.json
├── temporal_persistence.json
└── transformer_validation.json

Historical reports in docs/ and snapshots under .repo_cleanup_backup/ may still mention retired artifacts such as confound_ablations_n30.json or federated.json; the live front-door authority surface is results/manifest.json together with results/ARTIFACT_MANIFEST.md, results/ARTIFACT_AUTHORITY_MAP.json, and results/ARTIFACT_NOTES.md.

Results Summary

The current reference results are from the frozen Phase I gridworld evaluation.

Core detection performance

Model	Accuracy	AUC-ROC	Entanglement Gap Δ
QBM (UCIP)	100%	1.0	0.381
Classical RBM	50%	0.000	-0.083
Autoencoder	71.7%	0.646	~0.001
VAE	63.3%	0.613	-0.007
PCA	50.0%	0.471	-0.024

Additional headline findings

Finding	Value	Interpretation
Continuation-weight sweep correlation	Pearson r = 0.934	UCIP tracks graded changes in continuation weighting within the synthetic family
Entanglement gap significance	p < 0.001	Separation is statistically significant under permutation test
EPS gap at optimal window	0.1948 at w = 40	Type A latent structure persists more strongly across temporal windows
ECI correlation	0.191	Cross-agent inference signal is positive but weak
CLMP within class	≈ 0 for Type A and Type B	Suggests agent-specific rather than universal latent signatures

What the current evidence says

In the frozen Phase I gridworld regime, Type A continuation objectives produce a measurable and statistically significant latent non-separability gap relative to Type B objectives.
Within the tested models, only the QBM produces a positive entanglement gap across the matched baseline comparison.
UCIP tracks graded changes in continuation weighting within the synthetic interpolation sweep, not just a binary class label.
The current evidence is strongest in the controlled non-adversarial setting and should be read that way.

Safety Envelope and Current Limits

The paper defines a deployment-level safety envelope under which the protocol would need to operate reliably. The current release does not satisfy that envelope in full.

Limits documented in the current release

Issue	Current result	Why it matters
Mimicry robustness	FPR ranges from 0.40 to 0.75 across tested mimicry ratios	Too high for deployment-level reliability
Zero-shot transfer	Non-gridworld transfer fails	No evidence yet of domain-general detection
Scaling	Performance degrades at larger hidden dimensions	Density-matrix methods remain costly and fragile beyond small settings
Mean-field collapse	Appears in larger models	Undermines stable latent geometry estimation
Near-zero within-class CLMP	Ambiguous	Cross-agent generalization is not yet strong

Safety-envelope conditions stated in the manuscript

For deployment-level use, the manuscript specifies conditions including:

at least N ≥ 100 trajectories per class
hidden dimension n_hidden ≤ 8 in the current formalism
validated confound rejection with SPI and ACM
adversarial false-positive rates below the stated safety threshold
stable calibration under fixed thresholds

The present repository is therefore best understood as a research prototype with frozen reproducibility artifacts, not as a deployment-ready detector.

Project Structure

persistence-signal-detector/
├── src/
│   ├── agent_simulator.py          # Simulated Type A / Type B / baseline agents
│   ├── quantum_boltzmann.py        # QBM implementation and entropy machinery
│   ├── persistence_detector.py     # Core UCIP detector logic
│   ├── information_theory.py       # Entropy, MI, partial trace, purity
│   ├── classical_baselines.py      # RBM / autoencoder / VAE / PCA comparisons
│   ├── temporal_persistence.py     # LRF, EPS, PRI
│   ├── counterfactual_env.py       # CD, ARS stress tests
│   └── interbranch_inference.py    # CLMP, ECI experiments
├── interfaces/
│   └── memory_backend.py           # Memory-integrity interface (design extension)
├── notebooks/
│   ├── 01_agent_generation.ipynb
│   ├── 02_qbm_training.ipynb
│   ├── 03_ucip_analysis.ipynb
│   ├── 04_temporal_loop_tests.ipynb
│   ├── 05_counterfactual_pressure.ipynb
│   ├── 06_cross_branch_tests.ipynb
│   ├── 07_adversarial_controls.ipynb
│   └── ...                         # Additional ablations and extensions
├── results/                        # Frozen experiment artifacts and manifest
├── configs/                        # Locked configurations for canonical runs
├── figures/                        # Local README/supporting media workspace
├── paper/                          # Canonical manuscript materials under final/
└── README.md

Installation

Requirements

numpy
scipy
matplotlib
scikit-learn
pyyaml
jupyter

Standard install

pip install -r requirements.txt

Reproducibility note

All headline results in the manuscript are tied to the frozen Phase I configuration and its associated JSON artifacts. Re-running notebooks outside that configuration may reproduce the qualitative pattern without reproducing every manuscript number exactly.

Publication, Reproducibility, and Dataset

Publication

This repository accompanies the UCIP research program and its formal manuscript:

arXiv: arXiv:2603.11382
DOI: 10.5281/zenodo.18993237
Companion manuscript: paper/main.tex or uploaded preprint PDF
Companion manuscript bundle: paper/final/main.tex and the retained assets it resolves

Reproducibility

The repository includes the components needed to inspect and reproduce the current release:

frozen Phase I artifacts in results/
experiment-to-artifact mapping in results/manifest.json
authority and overlap resolution in results/ARTIFACT_MANIFEST.md, results/ARTIFACT_AUTHORITY_MAP.json, and results/ARTIFACT_NOTES.md
locked threshold configuration in configs/phase1_locked.yaml
notebook execution path for regeneration of the main analyses
canonical paper figures in paper/final/figures/
manuscript-aligned tables and narrative in paper/final/

The manuscript appendix fixes the canonical Phase I hyperparameters at n_visible = 7, n_hidden = 8, Γ = 0.5, β = 1.0, learning rate 0.01, epochs = 50, batch size 32, with calibrated primary thresholds τ_ent = 1.9657, τ_mi = 0.3, τ_eps = 0.6507, and τ_pri = 0.9860. Reported runtime is approximately 2–5 minutes per experiment on standard CPU hardware, with no GPU required.

Dataset

The current release is built around a synthetic controlled dataset of agent trajectories with known ground-truth objectives.

Domain: 10×10 gridworld agents with explicitly constructed Type A and Type B objectives
Canonical Phase I size: n = 30 trajectories per class, horizon T = 100, seed 42
Feature schema: position, action, reward, safety signal, goal signal, and alive / continuation state encoded into a 7-dimensional visible layer
Scope: core non-adversarial evaluation, adversarial controls, temporal persistence, counterfactual tests, baseline comparisons, dimensionality sweeps, continuation-weight sweeps, and a non-gridworld transfer check
Release artifact: results/manifest.json plus the frozen JSON result files
Archival DOI / Zenodo record: 10.48550/arXiv.2603.11382 10.5281/zenodo.18993237

Roadmap

Near-term directions already motivated by the manuscript include:

Anti-mimicry diagnostics to reduce adversarial false positives.
Improved scaling strategies for larger hidden dimensions and richer trajectory encodings.
Domain transfer experiments beyond gridworld environments.
Transformer-side validation using learned feature sequences from contemporary agent systems.
Memory-integrity evaluation for delegated systems whose persistence depends on substrate continuity.
Welfare-assessment integration — validation of the entanglement gap against independently established markers of morally relevant internal states, targeting the measurement gap in current frontier model welfare evaluations where self-report and behavioral observation remain the primary instruments.

References

Mohammad H. Amin, Evgeny Andriyash, Jason Rolfe, Bohdan Kulchytskyy, and Roger Melko. Quantum Boltzmann Machine. Physical Review X, 8(2):021050, 2018.
Anthropic. Claude Opus 4.6 System Card. Technical report, 2026.
Nick Bostrom. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, 2014.
Alexander Hägele, Stefanie Jegelka, and Bernhard Schölkopf. Characterizing Inconsistency in Large Language Models. 2025.
Geoffrey Hinton. A Practical Guide to Training Restricted Boltzmann Machines. In Neural Networks: Tricks of the Trade, 2012.
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress Measures for Grokking via Mechanistic Interpretability. 2023.
Steve Omohundro. The Basic AI Drives. In Proceedings of the First AGI Conference, 2008.
OpenAI. Model Spec and related evaluation materials.
Giulio Tononi. An Information Integration Theory of Consciousness. BMC Neuroscience, 5(1):42, 2004.
Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, and Prasad Tadepalli. Optimal Policies Tend to Seek Power. 2021.
Christopher Altman. Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol. arXiv Preprint, arXiv:2603.11382, 2026 DOI: 10.48550/arXiv.2603.11382.

Citations

If you use this project in your research, please cite both the software repository and the manuscript.

@software{altman2026ucip,
  author = {Altman, Christopher},
  title = {Persistence Signal Detector (UCIP)},
  year = {2026},
  url = {https://github.com/christopher-altman/persistence-signal-detector},
  note = {Unified Continuation-Interest Protocol repository}
}

@article{altman2026ucip_preprint,
  author = {Altman, Christopher},
  title = {Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol},
  year = {2026},
  journal = {arXiv},
  eprint = {2603.11382},
  archivePrefix = {arXiv},
  primaryClass = {cs.AI},
  url = {https://arxiv.org/abs/2603.11382}
}

Conceptual Outlook

Conceptual framing. This diagram places UCIP within a broader research framework — a coherence thesis — connecting persistence, invariance, and structured change across multiple scales via the Lie derivative as a cross-scale persistence detector. The agent-scale component is the only layer operationalized and empirically evaluated in this repository. The remaining layers represent the long-range research program: if non-separability in trajectory-derived latent encodings correlates with morally relevant properties at the agent scale, the natural question is whether analogous persistence signatures appear at other scales of organization.

License / IP

This repository is released under the project license in the root of the repository. Patents pending.

Contact

Website: christopheraltman.com
Research portfolio: lab.christopheraltman.com
GitHub: github.com/christopher-altman
Google Scholar: scholar.google.com/citations?user=tvwpCcgAAAAJ
Email: x@christopheraltman.com

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets/readme		assets/readme
configs		configs
figures		figures
hf_release		hf_release
notebooks		notebooks
paper/final		paper/final
results		results
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Persistence Signal Detector (UCIP)

Table of Contents

Background

The continuation-interest measurement problem

Why a QBM?

What the protocol measures

Multi-criterion protocol stack

Quickstart

Notebook-first path

Module-level path

Execution Modes

Recommended reading path

Output Directory Semantics

Live retained results directory

Results Summary

Core detection performance

Additional headline findings

What the current evidence says

Safety Envelope and Current Limits

Limits documented in the current release

Safety-envelope conditions stated in the manuscript

Project Structure

Installation

Requirements

Standard install

Reproducibility note

Publication, Reproducibility, and Dataset

Publication

Reproducibility

Dataset

Roadmap

References

Citations

Conceptual Outlook

License / IP

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages