For LLMs, the externally writable in-context token sequence IS the cognitive substrate for category-use — the only handle that is movable from outside the weights. A position paper paired with a pre-specified multi-model probe study (3 frontier models × 10 low-attestation coined targets plus 2 positive controls; 36 model×term cells, 108 trials, 324 paired distinguishability measurements; panel-vs-author Cohen's κ = +0.71) testing whether introducing a one-sentence definition widens the model's vocabulary boundary.
Author: Alexandru Mares — allemaar.com ORCID: 0009-0009-6713-9780 Version: 1.0.0 Status: Published — 2026-05-13 License: CC-BY-4.0
Zenodo: zenodo.org/records/20157153 Concept DOI (always resolves to latest version): 10.5281/zenodo.20157152
We argue that for a large language model (LLM) the externally writable in-context token sequence IS the cognitive substrate for category-use — not a representation OF cognition that runs on some deeper substrate, but the substrate itself, the only handle that is movable from outside the weights. We call this position the Token-Substrate Hypothesis (TSH). The strong form of Sapir–Whorf was rejected for humans because humans have prelinguistic cognition, the Off-Token Route; LLMs do not, and for them Wittgenstein's Tractatus 5.6 stops being metaphor and becomes architecture. We test TSH with a methodology we call the Coinage Probe: a paired-trial elicitation that scores an LLM's distinguishability on a coined term against named near-neighbors before and after introducing a one-sentence canonical definition. Across 3 cross-vendor frontier models (Claude Opus 4.7, GPT-5.5, Gemini 2.5 Pro) and 10 low-attestation coined targets plus 2 positive controls, we ran 108 trials × 3 near-neighbors per trial = 324 paired distinguishability measurements across 36 model×term cells, scored by a three-judge panel and compared against an author-rated 22-trial audit sample (panel-vs-author Cohen's κ = +0.71). Mean cell-level Lexical Reachability (post minus cold) was +5.47 on a 9-point scale (95% CI: +5.13, +5.80; cell-level Cohen's d_cell = +3.95 across n = 30 novel model×term cells). The effect replicated across the panel (cross-model CV = 0.109) with a model-style interaction qualifying strict invariance, did not persist into a re-cold chat (H3 supported), and was absent on positive controls (H4 supported). These results support a bounded version of the Token-Substrate Hypothesis: in-context vocabulary functions as an externally writable substrate for LLM category use. For deployed LLM systems, notation is therefore not mere packaging; it is a design surface that shapes what distinctions the system can reliably use.
- Paper (PDF):
paper.pdf(rendered at v1.0.0 release) - Paper (Markdown source):
paper.md - Figures:
figures/— Three infographic plates (fig1-infographic-wittgenstein-flip.png,fig2-infographic-coinage-probe-shape.png,fig3-infographic-boundary-shift.png) and three analysis figures (fig4-lr-strip.png,fig5-model-boxplots.png,fig6-stratum-boxplots.png). - Changelog:
changelog.md
@misc{mares2026tsh,
author = {Mares, Alexandru},
title = {The Limits of My Tokens: The Token-Substrate Hypothesis and the Coinage Probe},
year = {2026},
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.20157153},
url = {https://doi.org/10.5281/zenodo.20157153}
}GitHub also renders a "Cite this repository" button from CITATION.cff.
This is a position-and-empirical paper. The Coinage Probe protocol is fully specified in the paper body (§3) and a separate pre-specification document (Appendix C). Per-trial verbatim transcripts and the panel-and-human-rater scores are deposited with the Zenodo release. Independent reproduction requires API access to the three frontier models listed (Claude Opus 4.7, GPT-5.5, Gemini 2.5 Pro) and adherence to the pre-specified judging rubric and seed (0x4D29_8B1F_6E07_C3A2).
This paper is paired with Elastic Automators: A Diagnostic Vocabulary for Language-Model-Driven Workflow Systems (Mares 2026; Zenodo DOI 10.5281/zenodo.19802018). EA names what these systems do; TSH names what their cognition runs on. The follow-up Token-Bound Automation paper bridges the two via the claim that every elastic automator is architecturally a token-bound automator.
See allemaar/papers for the full research program across TK, YON, TSH, SEN, EA, and SAI domains.
The paper text and figures are released under CC-BY-4.0. Any code in subdirectories is released under MIT unless stated otherwise.
