BrainGrow

Developmental AI Architecture — Proof of Concept
Vektas Solutions · April 2026 · Author: Matthew Monroe

Overview

Current large language models are trained in a single static run — compressing all knowledge into a frozen weight matrix before the model ever interacts with the world. BrainGrow inverts this paradigm.

Inspired by human neurodevelopment — where an infant is born with more synaptic connections than an adult, and the brain sculpts intelligence through use-dependent pruning — BrainGrow pre-allocates a large vector space and allows knowledge to grow into it organically through staged exposure and interaction. Dormant capacity is preserved for future expansion rather than discarded.

Core hypothesis: AGI-adjacent behavior may emerge not from more data fed into a static architecture, but from developmental dynamics — growth, reinforcement, and pruning over time.

Key Contribution: Three-Tier Epistemic Architecture

BrainGrow produces three distinct epistemic states that emerge from architectural properties of the vector space alone — no RLHF, no fine-tuning, no trained suppression:

State	Meaning	Trigger
✓ Confident	Query maps strongly to a known positive domain	Similarity above threshold, positive domain
🤔 Honest Unknown	Query falls into dormant space — genuinely not learned yet	Similarity below threshold
⚠️ Out-of-Bounds	Query maps to a registered negative domain — system flags the boundary crossing	Nearest slot is a negative domain, regardless of similarity

No current LLM has this. They have one output state — confident — regardless of which category a query falls into.

Architecture

[ Pre-allocate 200,000 vector slots — large, mostly empty ]
          ↓
[ Stage 1: Feed Domain A text → vectors activate in sparse regions ]
          ↓
[ Stage 2: Feed Domain B text → grows into NEW unused regions ]
          ↓
[ Query: route through active vectors only ]
          ↓
[ Pruning pass: decay dormant, reinforce active ]
          ↓
[ Expansion: new domain claims previously dormant space ]

The Six Tabs

Tab	What It Demonstrates
Grow	Pre-allocated vector space. Knowledge grows into it progressively. Active vs. dormant vectors visualised in real time via UMAP / PCA. Includes Stage Diff and Refresh UMAP controls.
Query	Routing through active vectors only. New domain knowledge grows into previously dormant space without overwriting existing knowledge.
Prune	Use-dependent pruning pass. Dormant vectors decay. Active vectors strengthen. Before/after comparison visualised.
Compare	Hallucination demo. Runs identical queries against a saturated DenseModel and BrainGrow side-by-side. BrainGrow returns one of three epistemic states (Confident / Honest Unknown / Out-of-Bounds); DenseModel always returns confident regardless of whether it should. Demonstrates that hallucination is an architectural property — not a scale or data-quantity problem.
Network	Save / load complete network state as `.bgstate` files. Autosave after every Ingest Stage (essential for long TinyStories runs).
TinyStories	Scale test against the `roneneldan/TinyStories` corpus — 100,000 real-world story snippets, 200,000 slot space, unlabeled developmental growth. Three progressive stages (smoke test → small scale → full scale).

Project Structure

braingrow/
├── main.py                  # Gradio app entry point (6-tab UI)
├── session.py               # BrainGrowSession — all business logic, no state in main.py
├── vector_space.py          # Pre-allocation, activation tracking, pruning (200k slots)
├── growth_engine.py         # Staged ingestion, batch encoding, slot assignment
├── query_router.py          # Routes queries through active vectors only
├── comparison_harness.py    # DenseModel vs BrainGrow hallucination comparison (Tab 4)
├── tinystories_loader.py    # TinyStories data pipeline (Tab 6, requires datasets)
├── visualizer.py            # UMAP projection & Plotly charts
├── instrumentation.py       # Optional timing / error tracing (BRAINGROW_TRACE=1)
├── utils.py                 # Shared unit-normalised encoding utilities
├── requirements.txt         # Core Python dependencies
├── saves/                   # .bgstate network snapshots (autosave target)
└── tests/                   # Pytest suite (one test file per module)

Requirements

Python 3.11+
PyTorch 2.x (CPU or CUDA)
sentence-transformers
Gradio 4+
Plotly
UMAP-learn
NumPy
scikit-learn
datasets (optional — required for Tab 6 TinyStories only)

Setup

# Clone / download the project, then:
cd braingrow

# Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# Install core dependencies
pip install -r requirements.txt

# PyTorch (CPU build — sufficient for the POC):
pip install torch --index-url https://download.pytorch.org/whl/cpu

# Optional: TinyStories scale experiment (Tab 6)
pip install datasets

# Launch
python main.py

Then open the URL printed in the console (default: http://localhost:7860).

Tracing / Instrumentation

To enable verbose timing and error traces during development:

BRAINGROW_TRACE=1 python main.py

# Redirect trace output to a file:
BRAINGROW_TRACE=1 BRAINGROW_LOG=braingrow.log python main.py

Demo Script

Follow these steps to see the developmental dynamics in action:

Step	Action	Expected Observation
1	Initialize	Launch app. UMAP shows 200,000 grey dormant slots.
2	Stage 1 — Science	Ingest science chunks. UMAP lights up in a sparse cluster. Histogram shows a tiny active fraction.
3	Stage 2 — History	Ingest history chunks. A new cluster appears in a different region. Science cluster unchanged.
4	Query — Science	Ask a science question. Routing highlights the science cluster only.
5	Query — History	Ask a history question. Routing highlights the history cluster. No cross-contamination.
6	Prune	Run pruning at threshold 0.2. Low-activation slots grey out. Core concepts survive.
7	Expand	Ingest Stage 3 (e.g. cooking). Grows into space freed by pruning.
8	Compare	Switch to Tab 4. Run Known / Partial / Unknown queries. BrainGrow returns one of three epistemic states; DenseModel returns confident for all of them.
9	Save	Switch to Tab 5. Save the network state to `saves/` before lengthy experiments.
10	TinyStories	Switch to Tab 6. Run Stage A (smoke test, ~1k chunks), then Stage B (10k), then Stage C (full scale). Enable Autosave first.

Success Metrics

The POC is considered successful when it demonstrates:

Spatial separation — domains ingested at different stages occupy geometrically distinct regions.
Non-destructive expansion — adding a new domain does not shift or corrupt previously activated regions.
Routing isolation — queries correctly activate domain-relevant slots and ignore unrelated ones.
Pruning recovery — after a pruning pass, a new domain successfully claims reclaimed dormant space.
Three-tier epistemic output — BrainGrow returns Confident, Honest Unknown, or Out-of-Bounds depending on where in the vector space a query lands, while DenseModel always returns confident.
Visual legibility — a non-technical observer can watch the space grow and intuitively understand what is happening.
Scale durability — the TinyStories pipeline ingests 100k story chunks across 200,000 slots without slot exhaustion or UMAP collapse.

Key Design Decisions

Decision	Rationale
200,000 pre-allocated slots	Sufficient headroom for TinyStories full-scale run without live reallocation.
`all-MiniLM-L6-v2` (384-dim)	Compact, fast, well-calibrated for semantic similarity at CPU speeds.
Reinforce threshold 0.92	Near-duplicate chunks strengthen existing slots rather than wasting dormant space.
Thread-safe `RLock`	Gradio's concurrent callbacks can write without race conditions.
`BrainGrowSession` business-logic class	All state and logic isolated from Gradio; trivially testable and replaceable.
`.bgstate` persistence	Full snapshot (embeddings + activations + metadata) prevents data loss on long runs.

Running Tests

pytest tests/

The test suite covers all core modules: vector space, growth engine, query router, comparison harness, session, visualizer, instrumentation, utilities, and the TinyStories loader.

Tests use a deterministic mock encoder (sha256-seeded 8-dimensional unit vectors) so the full suite runs in seconds without GPU or network access. The mock encoder's output dimension (8) is intentionally smaller than the production all-MiniLM-L6-v2 encoder (384-dim) — this is by design for test speed and does not affect correctness of the behavioral assertions.

Future Directions

Embodied feedback loop — replace static text ingestion with agent-environment interaction; slots activate based on reward signal, not just semantic similarity.
Hierarchical pruning — staged fine-to-coarse pruning mirroring cortical development.
Cross-domain generalization — test whether concepts in overlapping vector regions produce emergent analogical reasoning.
Comparison baseline — train an equivalently-sized static model on the same data; compare query accuracy and representational geometry.
Publication — POC results constitute a viable workshop paper submission to NeurIPS, ICLR, or AAAI.

License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BrainGrow

Overview

Key Contribution: Three-Tier Epistemic Architecture

Architecture

The Six Tabs

Project Structure

Requirements

Setup

Tracing / Instrumentation

Demo Script

Success Metrics

Key Design Decisions

Running Tests

Future Directions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
files		files
tests		tests
.coverage		.coverage
.gitignore		.gitignore
README.md		README.md
comparison_harness.py		comparison_harness.py
experiment_4_7.py		experiment_4_7.py
growth_engine.py		growth_engine.py
instrumentation.py		instrumentation.py
knowledge_maintenance.py		knowledge_maintenance.py
main.py		main.py
query_router.py		query_router.py
requirements.txt		requirements.txt
session.py		session.py
tinystories_loader.py		tinystories_loader.py
utils.py		utils.py
vector_space.py		vector_space.py
visualizer.py		visualizer.py

Folders and files

Latest commit

History

Repository files navigation

BrainGrow

Overview

Key Contribution: Three-Tier Epistemic Architecture

Architecture

The Six Tabs

Project Structure

Requirements

Setup

Tracing / Instrumentation

Demo Script

Success Metrics

Key Design Decisions

Running Tests

Future Directions

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages